.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation .. .. SPDX-License-Identifier: CC-BY-4.0 .. _onemkl_blas_gemm_batch: gemm_batch ========== Computes a group of ``gemm`` operations. .. _onemkl_blas_gemm_batch_description: .. rubric:: Description The ``gemm_batch`` routines are batched versions of :ref:`onemkl_blas_gemm`, performing multiple ``gemm`` operations in a single call. Each ``gemm`` operation perform a matrix-matrix product with general matrices. ``gemm_batch`` supports the following precisions. .. list-table:: :header-rows: 1 * - T * - ``float`` * - ``double`` * - ``std::complex`` * - ``std::complex`` .. _onemkl_blas_gemm_batch_buffer: gemm_batch (Buffer Version) --------------------------- .. rubric:: Description The buffer version of ``gemm_batch`` supports only the strided API. The strided API operation is defined as: :: for i = 0 … batch_size – 1 A, B and C are matrices at offset i * stridea, i * strideb, i * stridec in a, b and c. C := alpha * op(A) * op(B) + beta * C end for where: op(X) is one of op(X) = X, or op(X) = X\ :sup:`T`, or op(X) = X\ :sup:`H`, ``alpha`` and ``beta`` are scalars, ``A``, ``B``, and ``C`` are matrices, op(``A``) is ``m`` x ``k``, op(``B``) is ``k`` x ``n``, and ``C`` is ``m`` x ``n``. The ``a``, ``b`` and ``c`` buffers contain all the input matrices. The stride between matrices is given by the stride parameter. The total number of matrices in ``a``, ``b`` and ``c`` buffers is given by the ``batch_size`` parameter. **Strided API** .. rubric:: Syntax .. code-block:: cpp namespace oneapi::mkl::blas::column_major { void gemm_batch(sycl::queue &queue, onemkl::transpose transa, onemkl::transpose transb, std::int64_t m, std::int64_t n, std::int64_t k, T alpha, sycl::buffer &a, std::int64_t lda, std::int64_t stridea, sycl::buffer &b, std::int64_t ldb, std::int64_t strideb, T beta, sycl::buffer &c, std::int64_t ldc, std::int64_t stridec, std::int64_t batch_size) } .. code-block:: cpp namespace oneapi::mkl::blas::row_major { void gemm_batch(sycl::queue &queue, onemkl::transpose transa, onemkl::transpose transb, std::int64_t m, std::int64_t n, std::int64_t k, T alpha, sycl::buffer &a, std::int64_t lda, std::int64_t stridea, sycl::buffer &b, std::int64_t ldb, std::int64_t strideb, T beta, sycl::buffer &c, std::int64_t ldc, std::int64_t stridec, std::int64_t batch_size) } .. container:: section .. rubric:: Input Parameters queue The queue where the routine should be executed. transa Specifies op(``A``) the transposition operation applied to the matrices ``A``. See :ref:`onemkl_datatypes` for more details. transb Specifies op(``B``) the transposition operation applied to the matrices ``B``. See :ref:`onemkl_datatypes` for more details. m Number of rows of op(``A``) and ``C``. Must be at least zero. n Number of columns of op(``B``) and ``C``. Must be at least zero. k Number of columns of op(``A``) and rows of op(``B``). Must be at least zero. alpha Scaling factor for the matrix-matrix products. a Buffer holding the input matrices ``A`` with size ``stridea`` * ``batch_size``. lda The leading dimension of the matrices ``A``. It must be positive. .. list-table:: :header-rows: 1 * - - ``A`` not transposed - ``A`` transposed * - Column major - ``lda`` must be at least ``m``. - ``lda`` must be at least ``k``. * - Row major - ``lda`` must be at least ``k``. - ``lda`` must be at least ``m``. stridea Stride between different ``A`` matrices. b Buffer holding the input matrices ``B`` with size ``strideb`` * ``batch_size``. ldb The leading dimension of the matrices``B``. It must be positive. .. list-table:: :header-rows: 1 * - - ``B`` not transposed - ``B`` transposed * - Column major - ``ldb`` must be at least ``k``. - ``ldb`` must be at least ``n``. * - Row major - ``ldb`` must be at least ``n``. - ``ldb`` must be at least ``k``. strideb Stride between different ``B`` matrices. beta Scaling factor for the matrices ``C``. c Buffer holding input/output matrices ``C`` with size ``stridec`` * ``batch_size``. ldc The leading dimension of the matrices ``C``. It must be positive and at least ``m`` if column major layout is used to store matrices or at least ``n`` if column major layout is used to store matrices. stridec Stride between different ``C`` matrices. Must be at least ``ldc`` * ``n``. batch_size Specifies the number of matrix multiply operations to perform. .. container:: section .. rubric:: Output Parameters c Output buffer, overwritten by ``batch_size`` matrix multiply operations of the form ``alpha`` * op(``A``)*op(``B``) + ``beta`` * ``C``. .. container:: section .. rubric:: Notes If ``beta`` = 0, matrix ``C`` does not need to be initialized before calling ``gemm_batch``. .. container:: section .. rubric:: Throws This routine shall throw the following exceptions if the associated condition is detected. An implementation may throw additional implementation-specific exception(s) in case of error conditions not covered here. :ref:`oneapi::mkl::invalid_argument` :ref:`oneapi::mkl::unsupported_device` :ref:`oneapi::mkl::host_bad_alloc` :ref:`oneapi::mkl::device_bad_alloc` :ref:`oneapi::mkl::unimplemented` .. _onemkl_blas_gemm_batch_usm: gemm_batch (USM Version) --------------------------- .. rubric:: Description The USM version of ``gemm_batch`` supports the group API and strided API. The group API operation is defined as: :: idx = 0 for i = 0 … group_count – 1 for j = 0 … group_size – 1 A, B, and C are matrices in a[idx], b[idx] and c[idx] C := alpha[i] * op(A) * op(B) + beta[i] * C idx = idx + 1 end for end for The strided API operation is defined as :: for i = 0 … batch_size – 1 A, B and C are matrices at offset i * stridea, i * strideb, i * stridec in a, b and c. C := alpha * op(A) * op(B) + beta * C end for where: op(X) is one of op(X) = X, or op(X) = X\ :sup:`T`, or op(X) = X\ :sup:`H`, ``alpha`` and ``beta`` are scalars, ``A``, ``B``, and ``C`` are matrices, op(``A``) is ``m`` x ``k``, op(``B``) is ``k`` x ``n``, and ``C`` is ``m`` x ``n``. For group API, ``a``, ``b`` and ``c`` arrays contain the pointers for all the input matrices. The total number of matrices in ``a``, ``b`` and ``c`` are given by: .. math:: total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i] For strided API, ``a``, ``b``, ``c`` arrays contain all the input matrices. The total number of matrices in ``a``, ``b`` and ``c`` are given by the ``batch_size`` parameter. **Group API** .. rubric:: Syntax .. code-block:: cpp namespace oneapi::mkl::blas::column_major { sycl::event gemm_batch(sycl::queue &queue, onemkl::transpose *transa, onemkl::transpose *transb, std::int64_t *m, std::int64_t *n, std::int64_t *k, T *alpha, const T **a, std::int64_t *lda, const T **b, std::int64_t *ldb, T *beta, T **c, std::int64_t *ldc, std::int64_t group_count, std::int64_t *group_size, const sycl::vector_class &dependencies = {}) } .. code-block:: cpp namespace oneapi::mkl::blas::row_major { sycl::event gemm_batch(sycl::queue &queue, onemkl::transpose *transa, onemkl::transpose *transb, std::int64_t *m, std::int64_t *n, std::int64_t *k, T *alpha, const T **a, std::int64_t *lda, const T **b, std::int64_t *ldb, T *beta, T **c, std::int64_t *ldc, std::int64_t group_count, std::int64_t *group_size, const sycl::vector_class &dependencies = {}) } .. container:: section .. rubric:: Input Parameters queue The queue where the routine should be executed. transa Array of ``group_count`` ``onemkl::transpose`` values. ``transa[i]`` specifies the form of op(``A``) used in the matrix multiplication in group ``i``. See :ref:`onemkl_datatypes` for more details. transb Array of ``group_count`` ``onemkl::transpose`` values. ``transb[i]`` specifies the form of op(``B``) used in the matrix multiplication in group ``i``. See :ref:`onemkl_datatypes` for more details. m Array of ``group_count`` integers. ``m[i]`` specifies the number of rows of op(``A``) and ``C`` for every matrix in group ``i``. All entries must be at least zero. n Array of ``group_count`` integers. ``n[i]`` specifies the number of columns of op(``B``) and ``C`` for every matrix in group ``i``. All entries must be at least zero. k Array of ``group_count`` integers. ``k[i]`` specifies the number of columns of op(``A``) and rows of op(``B``) for every matrix in group ``i``. All entries must be at least zero. alpha Array of ``group_count`` scalar elements. ``alpha[i]`` specifies the scaling factor for every matrix-matrix product in group ``i``. a Array of pointers to input matrices ``A`` with size ``total_batch_count``. See :ref:`matrix-storage` for more details. lda Array of ``group_count`` integers. ``lda[i]`` specifies the leading dimension of ``A`` for every matrix in group ``i``. All entries must be positive. .. list-table:: :header-rows: 1 * - - ``A`` not transposed - ``A`` transposed * - Column major - ``lda[i]`` must be at least ``m[i]``. - ``lda[i]`` must be at least ``k[i]``. * - Row major - ``lda[i]`` must be at least ``k[i]``. - ``lda[i]`` must be at least ``m[i]``. b Array of pointers to input matrices ``B`` with size ``total_batch_count``. See :ref:`matrix-storage` for more details. ldb Array of ``group_count`` integers. ``ldb[i]`` specifies the leading dimension of ``B`` for every matrix in group ``i``. All entries must be positive. .. list-table:: :header-rows: 1 * - - ``B`` not transposed - ``B`` transposed * - Column major - ``ldb[i]`` must be at least ``k[i]``. - ``ldb[i]`` must be at least ``n[i]``. * - Row major - ``ldb[i]`` must be at least ``n[i]``. - ``ldb[i]`` must be at least ``k[i]``. beta Array of ``group_count`` scalar elements. ``beta[i]`` specifies the scaling factor for matrix ``C`` for every matrix in group ``i``. c Array of pointers to input/output matrices ``C`` with size ``total_batch_count``. See :ref:`matrix-storage` for more details. ldc Array of ``group_count`` integers. ``ldc[i]`` specifies the leading dimension of ``C`` for every matrix in group ``i``. All entries must be positive and ``ldc[i]`` must be at least ``m[i]`` if column major layout is used to store matrices or at least ``n[i]`` if row major layout is used to store matrices. group_count Specifies the number of groups. Must be at least 0. group_size Array of ``group_count`` integers. ``group_size[i]`` specifies the number of matrix multiply products in group ``i``. All entries must be at least 0. dependencies List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies. .. container:: section .. rubric:: Output Parameters c Overwritten by the ``m[i]``-by-``n[i]`` matrix calculated by (``alpha[i]`` * op(``A``)*op(``B``) + ``beta[i]`` * ``C``) for group ``i``. .. container:: section .. rubric:: Notes If ``beta`` = 0, matrix ``C`` does not need to be initialized before calling ``gemm_batch``. .. container:: section .. rubric:: Return Values Output event to wait on to ensure computation is complete. **Strided API** .. rubric:: Syntax .. code-block:: cpp namespace oneapi::mkl::blas::column_major { sycl::event gemm_batch(sycl::queue &queue, onemkl::transpose transa, onemkl::transpose transb, std::int64_t m, std::int64_t n, std::int64_t k, T alpha, const T *a, std::int64_t lda, std::int64_t stridea, const T *b, std::int64_t ldb, std::int64_t strideb, T beta, T *c, std::int64_t ldc, std::int64_t stridec, std::int64_t batch_size, const sycl::vector_class &dependencies = {}) } .. code-block:: cpp namespace oneapi::mkl::blas::row_major { sycl::event gemm_batch(sycl::queue &queue, onemkl::transpose transa, onemkl::transpose transb, std::int64_t m, std::int64_t n, std::int64_t k, T alpha, const T *a, std::int64_t lda, std::int64_t stridea, const T *b, std::int64_t ldb, std::int64_t strideb, T beta, T *c, std::int64_t ldc, std::int64_t stridec, std::int64_t batch_size, const sycl::vector_class &dependencies = {}) } .. container:: section .. rubric:: Input Parameters queue The queue where the routine should be executed. transa Specifies op(``A``) the transposition operation applied to the matrices ``A``. See :ref:`onemkl_datatypes` for more details. transb Specifies op(``B``) the transposition operation applied to the matrices ``B``. See :ref:`onemkl_datatypes` for more details. m Number of rows of op(``A``) and ``C``. Must be at least zero. n Number of columns of op(``B``) and ``C``. Must be at least zero. k Number of columns of op(``A``) and rows of op(``B``). Must be at least zero. alpha Scaling factor for the matrix-matrix products. a Pointer to input matrices ``A`` with size ``stridea`` * ``batch_size``. lda The leading dimension of the matrices ``A``. It must be positive. .. list-table:: :header-rows: 1 * - - ``A`` not transposed - ``A`` transposed * - Column major - ``lda`` must be at least ``m``. - ``lda`` must be at least ``k``. * - Row major - ``lda`` must be at least ``k``. - ``lda`` must be at least ``m``. stridea Stride between different ``A`` matrices. b Pointer to input matrices ``B`` with size ``strideb`` * ``batch_size``. ldb The leading dimension of the matrices``B``. It must be positive. .. list-table:: :header-rows: 1 * - - ``B`` not transposed - ``B`` transposed * - Column major - ``ldb`` must be at least ``k``. - ``ldb`` must be at least ``n``. * - Row major - ``ldb`` must be at least ``n``. - ``ldb`` must be at least ``k``. strideb Stride between different ``B`` matrices. beta Scaling factor for the matrices ``C``. c Pointer to input/output matrices ``C`` with size ``stridec`` * ``batch_size``. ldc The leading dimension of the matrices ``C``. It must be positive and at least ``m`` if column major layout is used to store matrices or at least ``n`` if column major layout is used to store matrices. stridec Stride between different ``C`` matrices. batch_size Specifies the number of matrix multiply operations to perform. dependencies List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies. .. container:: section .. rubric:: Output Parameters c Output matrices, overwritten by ``batch_size`` matrix multiply operations of the form ``alpha`` * op(``A``)*op(``B``) + ``beta`` * ``C``. .. container:: section .. rubric:: Notes If ``beta`` = 0, matrix ``C`` does not need to be initialized before calling ``gemm_batch``. .. container:: section .. rubric:: Return Values Output event to wait on to ensure computation is complete. .. container:: section .. rubric:: Throws This routine shall throw the following exceptions if the associated condition is detected. An implementation may throw additional implementation-specific exception(s) in case of error conditions not covered here. :ref:`oneapi::mkl::invalid_argument` :ref:`oneapi::mkl::unsupported_device` :ref:`oneapi::mkl::host_bad_alloc` :ref:`oneapi::mkl::device_bad_alloc` :ref:`oneapi::mkl::unimplemented` **Parent topic:** :ref:`blas-like-extensions`