Video processing project blog: media pipeline

Sunday, June 24, 2012

Using the FrameSkippingFilter to drop the framerate

Sometimes it is desirable to reduce the framerate of a video. In the FrameSkippingFilter the user can select the value n, where a frame will be dropped every n frames. This makes it simple to halve the framerate.
This filter could easily be extended to perform more complex dropping schemes, but does not currently support that.

An example media pipeline shows the results:

A FramerateDisplayFilter has been inserted before the FrameSkippingFilter, and one after, to illustrate the effect.

As can be seen in the screen capture, the framerate has been halved.

Using the FramerateDisplayFilter

The FramerateDisplayFilter is a filter that is useful for check what framerate you are achieving in a live multimedia pipeline. In a live multimedia pipeline, it could be that an encoder is not compressing the media fast enough. It is sometimes desirable to know what frame rate is achievable using different resolutions, encoder modes, etc. It could also be useful in a system where one performs dynamic bitrate adaption.

The FramerateDisplayFilter uses a moving average over the last 50 samples to calculate the average and renders the estimate on top of the video using GDI+.

The FramerateDisplayFilter inherits CTransInPlaceFilter and currently has the following configuration options:

mode: time-stamp or system-time
X: x- position of the estimate (This can be off the screen)
Y: y- position of the estimate (This can be off the screen)

In time-stamp mode, the timestamps of the actual media samples is used in the average calculation.

In system-time mode, the time as the sample passes through the filter, is used in the average calculation.

Depending on the pipeline, there may be a minor or larger difference between the two.

As per usual, all settings are programmatically configurable using the COM ISettingsInterface interface.

On a side note, if anyone is interested in contributing to the development of this filter, capabilties to set the font, font-color, etc via the property page are still required.

Wednesday, March 14, 2012

Improving live multimedia pipeline performance

In this post, we will discuss code optimisation techniques necessary in real-time media pipelines. Live video requires that the media is processed fast enough to achieve the desired framerate e.g. to have a 15 fps framerate means that each frame should take no longer than 1000/15 = 66.6 ms.

A video pipeline is typically comprised of a media source, colour converters, scalers, croppers, video mixers, video codecs and media sinks.

Media pipeline

This means that all the operations together can not take more than 66.6 ms.
Although operations such as colour conversion are considered light-weight in relation to the video encoding, each link in the chain should be written as efficient as possible, within reasonable means. (Shaving a ms of colour conversion is not really going to make much difference if it is the encoder that takes 50ms per frame.)

The question is how we can improve algorithm performance?

Using the fixed point arithmetic over floating point
Less copies
Lookup tables over computation
Multi-threading
Increase cache hits
Using processor-specific knowledge (e.g. SIMD)
Using GPUs?
Improved algorithms structure(the big picture)

There is usually some kind of trade-off between speed and memory usage. In the case of the look-up table approach, there could be a slight computational overhead on start-up to compute the look-up table, with the benefit of less computations once the application is in a steady state. One should also take factors such as the size of the look-up table, and the target environment (i.e. desktop vs. embedded device) into account.

In this post, we will try out various techniques to improve the performance of the RGB to YUV420 colour converter ( source code available at the Video Processing Project).
The FrameGrabber project builds a simple multimedia pipeline consisting of a source, a sample grabber and a video renderer.

Once the sample grabber callback is triggered, we do the following:
- convert from RGB to YUV420
- convert back to RGB
- render image for visual confirmation that the conversion is correct.

The original color conversion code looks as follows can be seen in RealRGB24toYUV420Converter.cpp.

We will try to improve on this by adding a lookup table to minimize the multiplications as can be seen in FastLookupTableRGB24toYUV420Converter.cpp.

Next, we approach the problem using fixed point arithmetic as can be seen in FastFixedPointRGB24toYUV420Converter.cpp.
Here the idea is to use integer arithmetic over floating point.

Finally we attempted to use SIMD instructions to improve the colour converter performance as can be seen in FastSimdRGB24toYUV420Converter.cpp.

The FrameGrabber application is called with the following parameters:
FrameGrabber <<File>>.avi mode=0
where
mode 0 = original algorithm
mode 1 = lookup table
mode 2 = fixed-point arithmetic
mode 3 = SIMD
mode 4 = GPU (unimplemented)
mode 5 = multi-threaded

The standard Foreman test video sequence with CIF resolution was used as the video source in this experiment. The application was run 5 times per mode using an automated script and the results were averaged.

Mode: Total average Per frame Improvement %
0 1224.46ms 1.16ms
1 1105.63ms 1.02ms 12.26%
2 969.82ms 0.55ms 53.10%
3 1572.18ms 1.24ms -6.74%
5 1106.09ms 0.59ms 49.13%

As expected using a look-up table yields a notable improvement on the original algorithm. The fixed-point arithmetic performs best of all and is roughly twice as fast as the original algorithm. Surprisingly, the SIMD approach yielded no improvements, in fact performs slightly worse than the original. This could however be an implementation issue. (If you have a better solution, please drop us a line). FYI, the question was posted on stackoverflow. The multi-threaded approach yields also yields a performance gain though this approach should be taken with caution. I would not advise spawning additional threads for the purpose of optimising colour conversion.

Comments/criticism/suggestions/improvements? Please drop us a line. Feel free to download the source and give it a try.

Note:
In order to compile the solution with support for mode 5, USE_MULTI_THREADED must be defined in the Image and FrameGrabber projects. Additionally, boost::thread and boost::asio are used to scale the colour conversion across 2 processors and the relevant boost include and library paths need to be configured in Visual Studio.

Video processing project blog

Sunday, June 24, 2012

Using the FrameSkippingFilter to drop the framerate

Using the FramerateDisplayFilter

Wednesday, March 14, 2012

Improving live multimedia pipeline performance

Popular Posts

About Me