Decoding Procedures¶

The following pseudo code shows the decoding procedure:

MFXVideoDECODE_DecodeHeader(session, bitstream, &init_param);
MFXVideoDECODE_QueryIOSurf(session, &init_param, &request);
allocate_pool_of_frame_surfaces(request.NumFrameSuggested);
MFXVideoDECODE_Init(session, &init_param);
sts=MFX_ERR_MORE_DATA;
for (;;) {
   if (sts==MFX_ERR_MORE_DATA && !end_of_stream())
      append_more_bitstream(bitstream);
   find_unlocked_surface_from_the_pool(&work);
   bits=(end_of_stream())?NULL:bitstream;
   sts=MFXVideoDECODE_DecodeFrameAsync(session,bits,work,&disp,&syncp);
   if (sts==MFX_ERR_MORE_SURFACE) continue;
   if (end_of_stream() && sts==MFX_ERR_MORE_DATA) break;
   if (sts==MFX_ERR_REALLOC_SURFACE) {
      MFXVideoDECODE_GetVideoParam(session, &param);
      realloc_surface(work, param.mfx.FrameInfo);
      continue;
   }
   // skipped other error handling
   if (sts==MFX_ERR_NONE) {
      MFXVideoCORE_SyncOperation(session, syncp, INFINITE);
      do_something_with_decoded_frame(disp);
   }
}
MFXVideoDECODE_Close(session);
free_pool_of_frame_surfaces();

Note the following key points about the example:

The application can use the MFXVideoDECODE_DecodeHeader() function to retrieve decoding initialization parameters from the bitstream. This step is optional if the data is retrievable from other sources such as an audio/video splitter.
The application can use the MFXVideoDECODE_QueryIOSurf() function to obtain the number of working frame surfaces required to reorder output frames. This step is required if the application is responsible for memory allocation. Use of this function is not required if oneVPL is responsible for memory allocation.
The application calls the MFXVideoDECODE_DecodeFrameAsync() function for a decoding operation with the bitstream buffer (bits) and an unlocked working frame surface (work) as input parameters.

Attention

Starting with oneVPL API version 2.0, the application can provide NULL as the working frame surface that leads to internal memory allocation.
If decoding output is not available, the function returns a status code requesting additional bitstream input or working frame surfaces as follows:
- mfxStatus::MFX_ERR_MORE_DATA: The function needs additional bitstream input. The existing buffer contains less than a frame’s worth of bitstream data.
- mfxStatus::MFX_ERR_MORE_SURFACE: The function needs one more frame surface to produce any output.
- mfxStatus::MFX_ERR_REALLOC_SURFACE: Dynamic resolution change case - the function needs a bigger working frame surface (work).
Upon successful decoding, the MFXVideoDECODE_DecodeFrameAsync() function returns mfxStatus::MFX_ERR_NONE. However, the decoded frame data (identified by the surface_out pointer) is not yet available because the MFXVideoDECODE_DecodeFrameAsync() function is asynchronous. The application must use the MFXVideoCORE_SyncOperation() or mfxFrameSurfaceInterface interface to synchronize the decoding operation before retrieving the decoded frame data.
At the end of the bitstream, the application continuously calls the MFXVideoDECODE_DecodeFrameAsync() function with a NULL bitstream pointer to drain any remaining frames cached within the oneVPL decoder until the function returns mfxStatus::MFX_ERR_MORE_DATA.

The following pseudo code shows the simplified decoding procedure:

sts=MFX_ERR_MORE_DATA;
for (;;) {
   if (sts==MFX_ERR_MORE_DATA && !end_of_stream())
      append_more_bitstream(bitstream);
   bits=(end_of_stream())?NULL:bitstream;
   sts=MFXVideoDECODE_DecodeFrameAsync(session,bits,NULL,&disp,&syncp);
   if (sts==MFX_ERR_MORE_SURFACE) continue;
   if (end_of_stream() && sts==MFX_ERR_MORE_DATA) break;
   // skipped other error handling
   if (sts==MFX_ERR_NONE) {
      MFXVideoCORE_SyncOperation(session, syncp, INFINITE);
      do_something_with_decoded_frame(disp);
      release_surface(disp);
   }
}

oneVPL API version 2.0 introduces a new decoding approach. For simple use cases, when the user wants to decode a stream and does not want to set additional parameters, a simplified procedure for the decoder’s initialization has been proposed. In this scenario it is possible to skip explicit stages of a stream’s header decoding and the decoder’s initialization and instead to perform these steps implicitly during decoding of the first frame. This change also requires setting the additional field mfxBitstream::CodecId to indicate codec type. In this mode the decoder allocates mfxFrameSurface1 internally, so users should set the input surface to zero.

Bitstream Repositioning¶

The application can use the following procedure for bitstream reposition during decoding:

Use the MFXVideoDECODE_Reset() function to reset the oneVPL decoder.
Optional: If the application maintains a sequence header that correctly decodes the bitstream at the new position, the application may insert the sequence header to the bitstream buffer.
Append the bitstream from the new location to the bitstream buffer.
Resume the decoding procedure. If the sequence header is not inserted in the previous steps, the oneVPL decoder searches for a new sequence header before starting decoding.

Broken Streams Handling¶

Robustness and the capability to handle a broken input stream is an important part of the decoder.

First, the start code prefix (ITU-T* H.264 3.148 and ITU-T H.265 3.142) is used to separate NAL units. Then all syntax elements in the bitstream are parsed and verified. If any of the elements violate the specification, the input bitstream is considered invalid and the decoder tries to re-sync (find the next start code). Subsequent decoder behavior is dependent on which syntax element is broken:

SPS header is broken: return mfxStatus::MFX_ERR_INCOMPATIBLE_VIDEO_PARAM (HEVC decoder only, AVC decoder uses last valid).
PPS header is broken: re-sync, use last valid PPS for decoding.
Slice header is broken: skip this slice, re-sync.
Slice data is broken: corruption flags are set on output surface.

Many streams have IDR frames with frame_num != 0 while the specification says that “If the current picture is an IDR picture, frame_num shall be equal to 0” (ITU-T H.265 7.4.3).

VUI is also validated, but errors do not invalidate the whole SPS. The decoder either does not use the corrupted VUI (AVC) or resets incorrect values to default (HEVC).

Note

Some requirements are relaxed because there are many streams which violate the strict standard but can be decoded without errors.

Corruption at the reference frame is spread over all inter-coded pictures that use the reference frame for prediction. To cope with this problem you must either periodically insert I-frames (intra-coded) or use the intra-refresh technique. The intra-refresh technique allows recovery from corruptions within a predefined time interval. The main point of intra-refresh is to insert a cyclic intra-coded pattern (usually a row) of macroblocks into the inter-coded pictures, restricting motion vectors accordingly. Intra-refresh is often used in combination with recovery point SEI, where the recovery_frame_cnt is derived from the intra-refresh interval. The recovery point SEI message is well described at ITU-T H.264 D.2.7 and ITU-T H.265 D.2.8. If decoding starts from AU associated with this SEI message, then the message can be used by the decoder to determine from which picture all subsequent pictures have no errors. In comparison to IDR, the recovery point message does not mark reference pictures as “unused for reference”.

Besides validation of syntax elements and their constraints, the decoder also uses various hints to handle broken streams:

If there are no valid slices for the current frame, then the whole frame is skipped.
The slices which violate slice segment header semantics (ITU-T H.265 7.4.7.1) are skipped. Only the slice_temporal_mvp_enabled_flag is checked for now.
Since LTR (Long Term Reference) stays at DPB until it is explicitly cleared by IDR or MMCO, the incorrect LTR could cause long standing visual artifacts. AVC decoder uses the following approaches to handle this:
- When there is a DPB overflow in the case of an incorrect MMCO command that marks the reference picture as LT, the operation is rolled back.
- An IDR frame with frame_num != 0 can’t be LTR.
If the decoder detects frame gapping, it inserts “fake”’” (marked as non-existing) frames, updates FrameNumWrap (ITU-T H.264 8.2.4.1) for reference frames, and applies the Sliding Window (ITU-T H.264 8.2.5.3) marking process. Fake frames are marked as reference, but since they are marked as non-existing, they are not used for inter-prediction.

VP8 Specific Details¶

Unlike other oneVPL supported decoders, VP8 can accept only a complete frame as input. The application should provide the complete frame accompanied by the MFX_BITSTREAM_COMPLETE_FRAME flag. This is the single specific difference.

JPEG¶

The application can use the same decoding procedures for JPEG/motion JPEG decoding, as shown in the following pseudo code:

// optional; retrieve initialization parameters
MFXVideoDECODE_DecodeHeader(...);
// decoder initialization
MFXVideoDECODE_Init(...);
// single frame/picture decoding
MFXVideoDECODE_DecodeFrameAsync(...);
MFXVideoCORE_SyncOperation(...);
// optional; retrieve meta-data
MFXVideoDECODE_GetUserData(...);
// close
MFXVideoDECODE_Close(...);

The MFXVideoDECODE_Query() function will return mfxStatus::MFX_ERR_UNSUPPORTED if the input bitstream contains unsupported features.

For still picture JPEG decoding, the input can be any JPEG bitstreams that conform to the ITU-T Recommendation T.81 with an EXIF or JFIF header. For motion JPEG decoding, the input can be any JPEG bitstreams that conform to the ITU-T Recommendation T.81.

Unlike other oneVPL decoders, JPEG decoding supports three different output color formats: NV12, YUY2, and RGB32. This support sometimes requires internal color conversion and more complicated initialization. The color format of the input bitstream is described by the mfxInfoMFX::JPEGChromaFormat and mfxInfoMFX::JPEGColorFormat fields.. The MFXVideoDECODE_DecodeHeader() function usually fills them in. If the JPEG bitstream does not contains color format information, the application should provide it. Output color format is described by general oneVPL parameters: the mfxFrameInfo::FourCC and mfxFrameInfo::ChromaFormat fields.

Motion JPEG supports interlaced content by compressing each field (a half-height frame) individually. This behavior is incompatible with the rest of the oneVPL transcoding pipeline, where oneVPL requires fields to be in odd and even lines of the same frame surface. The decoding procedure is modified as follows:

The application calls the MFXVideoDECODE_DecodeHeader() function with the first field JPEG bitstream to retrieve initialization parameters.
The application initializes the oneVPL JPEG decoder with the following settings:
- The PicStruct field of the mfxVideoParam structure set to the correct interlaced type, MFX_PICSTRUCT_FIELD_TFF or MFX_PICSTRUCT_FIELD_BFF, from the motion JPEG header.
- Double the Height field in the mfxVideoParam structure as the value returned by the MFXVideoDECODE_DecodeHeader() function describes only the first field. The actual frame surface should contain both fields.
During decoding, the application sends both fields for decoding in the same mfxBitstream. The application should also set mfxBitstream::DataFlag to MFX_BITSTREAM_COMPLETE_FRAME. oneVPL decodes both fields and combines them into odd and even lines according to oneVPL convention.

By default, the MFXVideoDECODE_DecodeHeader() function returns the Rotation parameter so that after rotation, the pixel at the first row and first column is at the top left. The application can overwrite the default rotation before calling MFXVideoDECODE_Init().

The application may specify Huffman and quantization tables during decoder initialization by attaching mfxExtJPEGQuantTables and mfxExtJPEGHuffmanTables buffers to the mfxVideoParam structure. In this case, the decoder ignores tables from bitstream and uses the tables specified by the application. The application can also retrieve these tables by attaching the same buffers to mfxVideoParam and calling MFXVideoDECODE_GetVideoParam() or MFXVideoDECODE_DecodeHeader() functions.

Multi-view Video Decoding¶

The oneVPL MVC decoder operates on complete MVC streams that contain all view and temporal configurations. The application can configure the oneVPL decoder to generate a subset at the decoding output. To do this, the application must understand the stream structure and use the stream information to configure the decoder for target views.

The decoder initialization procedure is as follows:

The application calls the MFXVideoDECODE_DecodeHeader() function to obtain the stream structural information. This is done in two steps:
1. The application calls the MFXVideoDECODE_DecodeHeader() function with the mfxExtMVCSeqDesc structure attached to the mfxVideoParam structure. At this point, do not allocate memory for the arrays in the mfxExtMVCSeqDesc structure. Set the View, ViewId, and OP pointers to NULL and set NumViewAlloc, NumViewIdAlloc, and NumOPAlloc to zero. The function parses the bitstream and returns mfxStatus::MFX_ERR_NOT_ENOUGH_BUFFER with the correct values for NumView, NumViewId, and NumOP. This step can be skipped if the application is able to obtain the NumView, NumViewId, and NumOP values from other sources.
2. The application allocates memory for the View, ViewId, and OP arrays and calls the MFXVideoDECODE_DecodeHeader() function again. The function returns the MVC structural information in the allocated arrays.
The application fills the mfxExtMVCTargetViews structure to choose the target views, based on information described in the mfxExtMVCSeqDesc structure.
The application initializes the oneVPL decoder using the MFXVideoDECODE_Init() function. The application must attach both the mfxExtMVCSeqDesc structure and the mfxExtMVCTargetViews structure to the mfxVideoParam structure.

In the above steps, do not modify the values of the mfxExtMVCSeqDesc structure after the MFXVideoDECODE_DecodeHeader() function, as the oneVPL decoder uses the values in the structure for internal memory allocation. Once the application configures the oneVPL decoder, the rest of the decoding procedure remains unchanged. As shown in the pseudo code below, the application calls the MFXVideoDECODE_DecodeFrameAsync() function multiple times to obtain all target views of the current frame picture, one target view at a time. The target view is identified by the FrameID field of the mfxFrameInfo structure.

mfxExtBuffer *eb[2];
mfxExtMVCSeqDesc  seq_desc;
mfxVideoParam init_param;

init_param.ExtParam=(mfxExtBuffer **)&eb;
init_param.NumExtParam=1;
eb[0]=(mfxExtBuffer *)&seq_desc;
MFXVideoDECODE_DecodeHeader(session, bitstream, &init_param);

/* select views to decode */
mfxExtMVCTargetViews tv;
init_param.NumExtParam=2;
eb[1]=(mfxExtBuffer *)&tv;

/* initialize decoder */
MFXVideoDECODE_Init(session, &init_param);

/* perform decoding */
for (;;) {
    MFXVideoDECODE_DecodeFrameAsync(session, bits, work, &disp, &syncp);
    MFXVideoCORE_SyncOperation(session, syncp, INFINITE);
}

/* close decoder */
MFXVideoDECODE_Close(session);