Storage Formats

Depending on the value of the domain template value, the implementation of the DFT supports several storage schemes for input and output data. (See Charles Van Loan, Computational Frameworks for the Fast Fourier Transform , SIAM, Philadelphia, 1992 for motivation of these schemes).

The data elements are placed within contiguous memory blocks, defined with generalized strides (see INPUT_STRIDES and OUTPUT_STRIDES). For multiple transforms, all sets of data should be located within the same memory block, and the data sets should be placed at the same distance from each other (see NUMBER_OF TRANSFORMS and :config_param::FWD_DISTANCE, config_param::BWD_DISTANCE ).

The input data and strides sizes are stored and offsets counted in terms of elements of the data type (complex or real) based on the storage format and forward domain as seen in Element types for complex-to-complex transformation and COMPLEX_STORAGE, Element types for real-to-complex transformations and REAL_STORAGE and Element types for real-to-complex transformations and CONJUGATE_EVEN_STORAGE.

COMPLEX_STORAGE

For the domain template parameter with value COMPLEX, both input and output sequences belong to the complex domain. In this case, the configuration parameter COMPLEX_STORAGE can have one of the two values:COMPLEX_COMPLEX (default) or REAL_REAL .

COMPLEX_COMPLEX

With the config_value::COMPLEX_COMPLEX storage, complex-valued data sequences are stored in a single complex container (array/sycl::buffer), AZ, so that a complex-valued element \(z_{k_1, k_2,\dots ,k_d}\) of the m-th d-dimensional sequence is accessed at \(AZ[m*\text{distance} + \text{stride}_0 + k_1*\text{stride}_1 + k_2*\text{stride}_2 + \dots + k_d *\text{stride}_d ]\) as a structure consisting of the real and imaginary parts. This code illustrates the use of config_value::COMPLEX_COMPLEX storage with three dimensions (\(n_1,n_2,n_3\)) and m batches:

std::complex<datatype> * AZ; // 2*sizeof(datatype)*n1*n2*n3*m
std::vector<std::int64_t> ios; // length 4 of sizes
std::int64_t iodist;

// ...

// on input: Z(k1,k2,k3,m)
//   = AZ[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]
compute_forward(descr, AZ); // complex-to-complex in-place FFT
// on output: Z{k1,k2,k3,m}
//   = AZ[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]

REAL_REAL

With the config_value::REAL_REAL storage, complex-valued data sequences are stored by two real containers (arrays/sycl::buffer’s), AR and AI, so that a complex-valued element \(z_{k_1, k_2, \dots, k_d}\) of the m-th d-dimensional sequence has real part \(AR[m*\text{distance} + \text{stride}_0 + k_1*\text{stride}_1 + k_2*\text{stride}_2 + \dots + k_d*\text{stride}_d ]\) and imaginary part \(AI[m*\text{distance} + \text{stride}_0 + k_1*\text{stride}_1 + k_2*\text{stride}_2 + \dots + k_d*\text{stride}_d ]\). This code illustrates the use of config_value::REAL_REAL storage with three dimensions (\(n_1,n_2,n_3\)) and m batches:

datatype * AR; // sizeof(datatype)*n1*n2*n3*m
datatype * AI; // sizeof(datatype)*n1*n2*n3*m
std::vector<std::int64_t> ios; // length 4 of strides
std::int64_t iodist;

// ...

// on input: Z(k1,k2,k3,m)
//   =   AR[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]
//   + i*AI[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]
compute_forward(descr, AR, AI); // complex-to-complex in-place FFT
// on output: Z{k1,k2,k3,m}
//   =   AR[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]
//   + i*AI[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]

REAL_STORAGE

For the domain template parameter with value REAL, only the value of REAL_REAL is supported.

REAL_REAL

With the REAL_REAL storage, real-valued data sequences in a real domain are stored by one real container (array/sycl::buffer), AR, so that a real-valued element \(r_{k_1, k_2, \dots, k_d}\) of the m-th d-dimensional sequence is accessed as \(AR[m*\text{distance} + \text{stride}_0 + k_1*\text{stride}_1 + k_2*\text{stride}_2 + \dots + k_d*\text{stride}_d ]\). This code illustrates the use of config_value::REAL_REAL storage with three dimensions (\(n_1,n_2,n_3\)) and m batches:

datatype * AR; // sizeof(datatype)*n1*n2*n3*m
datatype * AI; // sizeof(datatype)*n1*n2*n3*m
std::vector<std::int64_t> ios; // length 4 of strides
std::int64_t iodist;

// ...

// on input: R(k1,k2,k3,m)
//   =   AR[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]
compute_forward(descr, AR, AI); // real-to-complex in-place FFT
// on output: Z{k1,k2,k3,m}
//   =   AR[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]
//   + i*AI[ ios[0] + k1*ios[1] + k2*ios[2] + k3*ios[3] + m*iodist ]

CONJUGATE_EVEN_STORAGE

For the domain template parameter with value REAL and considered as a conjugate-even domain, the value of config_value::COMPLEX_COMPLEX is supported. The conjugate-even symmetry of the data enables storing only about a half of the whole mathematical result, so that one part of it can be directly referenced in the memory while the other part can be reconstructed depending on the selected storage configuration. The config_param::PACKED_FORMAT configuration parameter defines how the data is packed. Possible values for config_param::PACKED_FORMAT depend on the values of the config_param::CONJUGATE_EVEN_STORAGE configuration parameter.

CONJUGATE_EVEN_STORAGE

Supported PACKED_FORMATS

COMPLEX_COMPLEX

config_value::CCE_FORMAT can be used with transforms of any dimension.

COMPLEX_COMPLEX

There is only one config_param::PACKED_FORMAT supported by the config_value::COMPLEX_COMPLEX value for config_param::CONJUGATE_EVEN_STORAGE, mainly the config_value::CCE_FORMAT. The complex-valued data sequence consists of one complex container (array/sycl::buffer), AZ, so that a complex-valued element \(z_{k_1, k_2, \dots, k_d}\) of the m-th d-dimensional sequence can be accessed or reconstructed as follows:

Consider a d-dimensional real-to-complex transform.

Because the input sequence, R, is real-valued, the mathematical result, Z, has conjugate-even symmetry: \(z_{k_1, k_2, \dots, k_d} = \text{conjugate}( z_{n_1-k_1, n_2-k_2, \dots, n_d-k_d} )\), where index arithmetic is performed modulo the length of the respective dimension. Obviously, the first element of the result is real-valued: \(z_{0, 0, \dots, 0} = \text{conjugate}( z_{0, 0, \dots, 0} )\).

For dimensions with even lengths, some of the other elements are real-valued as well. For example, if \(n_s\) is even, then \(z_{0, 0, \dots, \frac{n_s}{2}, 0, \dots, 0} = \text{conjugate}( z_{0, 0, \dots, \frac{n_s}{2}, 0, \dots, 0} )\). With the conjugate-even symmetry, approximately a half of the result suffices to fully reconstruct it. For an arbitrary dimension, \(h\) , it suffices to store elements \(z_{k_1, \dots, k_h , \dots, k_d}\) for the following indices:

  • \(k_h = 0, \dots, \left[ \frac{n_h}{2}\right]\)

  • \(k_i = 0, \dots, n_i-1\), where \(i = 1,\dots, d\) and \(i \neq h\)

and assuming that integer division rounds down.

The symmetry property enables reconstructing the remaining elements: for \(k_h = \left[ \frac{n_h}{2}\right] + 1, \dots , n_h - 1\). The halved dimension is always assumed to be the dimension for which storage is contiguous in memory (see strides), for example in a 2D row-major format, it is the last dimension and for 2D column-major format it is the first dimension.

Packed complex domain formats for a 1D real-to-complex transformation considered as a conjugate-even-domain with COMPLEX_COMPLEX storage and \(n=2L\) (even size) or \(n=2L+1\) (odd size).

\(k=\)

0

1

2

\(\dots\)

L-2

L-1

L

CCE

\(Z_0\)

\(Z_1\)

\(Z_2\)

\(\dots\)

\(Z_{L-2}\)

\(Z_{L-1}\)

\(Z_{L}\)

Packed complex domain formats for a 2D \(n_1\times n_2\) real-to-complex transformations considered as a conjugate-even-domain with COMPLEX_COMPLEX storage and \(n_1=2K\) (even size) and \(n_2=2L\) (even size) using row-major input data.

\(k_1\backslash k_2\)

\(0\)

1

2

\(\dots\)

L-1

L

\(0\)

\(Z_{0,0}\)

\(Z_{0,1}\)

\(Z_{0,2}\)

\(\dots\)

\(Z_{0,L-1}\)

\(Z_{0,L}\)

1

\(Z_{1,0}\)

\(Z_{1,1}\)

\(Z_{1,2}\)

\(\dots\)

\(Z_{1,L-1}\)

\(Z_{1,L}\)

2

\(Z_{2,0}\)

\(Z_{2,1}\)

\(Z_{2,2}\)

\(\dots\)

\(Z_{2,L-1}\)

\(Z_{2,L}\)

\(\dots\)

\(\dots\)

\(\dots\)

\(\dots\)

\(\dots\)

\(\dots\)

\(\dots\)

\(n_1-2\)

\(Z_{n_1-2,0}\)

\(Z_{n_1-2,1}\)

\(Z_{n_1-2,2}\)

\(\dots\)

\(Z_{n_1-2,L-1}\)

\(Z_{n_1-2,L}\)

\(n_1-1\)

\(Z_{n_1-1,0}\)

\(Z_{n_1-1,1}\)

\(Z_{n_1-1,2}\)

\(\dots\)

\(Z_{n_1-1,L-1}\)

\(Z_{n_1-1,L}\)

The following code illustrates usage of the config_value::COMPLEX_COMPLEX storage for a two-dimensional conjugate-even domain with row-major input data:

datatype * AR;                // sizeof(datatype)*n1*n2*m
std::complex<datatype> * AZ;  // sizeof(datatype)*n1*n2*m
std::vector<std::int64_t> is; // length 3 of input strides
std::vector<std::int64_t> os; // length 3 of output strides
std::int64_t idist, odist;

// ...

// on input: R(k1,k2,m)
//   =   AR[ is[0] + k1*is[1] + k2*is[2] + m*idist ]
compute_forward(descr, AR, AZ); // real-to-complex out-of-place FFT
// on output:
// for k2=0,n2/2:      Z{k1,k2,m} = AZ[os[0] + k1*os[1] + k2*os[2] + m*odist]
// for k2=n2/2+1,n2-1: Z{k1,k2,m} = conj(AZ[os[0] + (n1-k1)%n1*os[1]
//                                                + (n2-k2)%n2*os[2] + m*odist])

For the backward transform, the input and output parameters and layouts exchange roles. Set the strides describing the layout in the backward/forward domain as input/output strides, respectively. For example:

// ...
descr.set_value(config_param::INPUT_STRIDES,  fwd_domain_strides);
descr.set_value(config_param::OUTPUT_STRIDES, bwd_domain_strides);
descr.commit(queue);
compute_forward(descr, ...);
// ...
descr.set_value(config_param::INPUT_STRIDES,  bwd_domain_strides);
descr.set_value(config_param::OUTPUT_STRIDES, fwd_domain_strides);
descr.commit(queue);
compute_backward(descr, ...);

Parent topic Configuration Parameters and Enums