Video processing project blog: 2012

Tuesday, November 20, 2012

RTCP Timeout Interval (RFC3550 vs RFC4585)

The main advantages of using the Early Feedback Profile for RTCP-based feedback or AVPF are

reduced minimum interval: the 5 second minimum interval has been removed. The randomisation factor causes the interval to fall somewhere in the [2.5/1.21828, 7.5/1.21828] second interval unless the reduced minimum value is used.
Immediate/early feedback mode: important events can be communicated back to the sender provided that certain conditions are met.

Figure 1 illustrates the differences in the timeout intervals for unicast scenarios, where the top and bottom dotted lines represent the minimum and maximum intervals possible, the dashed line in the middle is the deterministic value of the timeout, and the solid line includes the effect of randomisation.

An average RTCP packet size of 96 bytes was used in the calculations. The initial timeout value was not considered.

Figure 1: RTCP Timeout Intervals

This shows that the mean RTCP interval for bitrates of around 100kbps and above is less than 200ms, which is a big step away from 5s.

Sunday, June 24, 2012

Using the FrameSkippingFilter to drop the framerate

Sometimes it is desirable to reduce the framerate of a video. In the FrameSkippingFilter the user can select the value n, where a frame will be dropped every n frames. This makes it simple to halve the framerate.
This filter could easily be extended to perform more complex dropping schemes, but does not currently support that.

An example media pipeline shows the results:

A FramerateDisplayFilter has been inserted before the FrameSkippingFilter, and one after, to illustrate the effect.

As can be seen in the screen capture, the framerate has been halved.

Using the FramerateDisplayFilter

The FramerateDisplayFilter is a filter that is useful for check what framerate you are achieving in a live multimedia pipeline. In a live multimedia pipeline, it could be that an encoder is not compressing the media fast enough. It is sometimes desirable to know what frame rate is achievable using different resolutions, encoder modes, etc. It could also be useful in a system where one performs dynamic bitrate adaption.

The FramerateDisplayFilter uses a moving average over the last 50 samples to calculate the average and renders the estimate on top of the video using GDI+.

The FramerateDisplayFilter inherits CTransInPlaceFilter and currently has the following configuration options:

mode: time-stamp or system-time
X: x- position of the estimate (This can be off the screen)
Y: y- position of the estimate (This can be off the screen)

In time-stamp mode, the timestamps of the actual media samples is used in the average calculation.

In system-time mode, the time as the sample passes through the filter, is used in the average calculation.

Depending on the pipeline, there may be a minor or larger difference between the two.

As per usual, all settings are programmatically configurable using the COM ISettingsInterface interface.

On a side note, if anyone is interested in contributing to the development of this filter, capabilties to set the font, font-color, etc via the property page are still required.

Building the VPP for 64-bit Windows

One difference between targeting 32 and 64 bit Windows is the VC compiler that is used. Under C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin one can find the directories containing the tools needed to target the various environments. More about that can be read on MSDN.

Other things that differ are the include and linker paths. Since we are launching VS with out custom batch file, we need to update our environment to target Windows 64. In VsVersion, we need to set TARGET=X86_64.

In 64-bit builds a different configuration is used in Visual Studio:

Additionally, one has to be sure that x64 is set as the target for static libraries built in the project.

If any mismatches are detected between x86 and x64, the compilation will fail with the following error message:

fatal error LNK1112: module machine type 'x64' conflicts with target machine type 'X86'

Another couple of tips if you run into compilation errors is to

make sure that you clean the solution
Check that the correct lib and bin directories are being used
Doublecheck the obj files that are created during compilation are 64-bit. You can do this by running dumpbin /HEADERS somefile.obj

If all went well, you should be able to load the 64-bit version of the filter in the 64-bit version of GraphStudio. You might have to make some tweaks to the paths depending on the platform you're building on, and on the target platform.

Saturday, June 23, 2012

Getting started with the video processing project

This post focuses on how to get the source and build the projects of the Video Processing Project. We will be using Visual Studio 2010 on Windows 7.

Checking out the source

The first step is to check out the source code from sourceforge to some directory VPP_ROOT on the local harddrive http://videoprocessing.svn.sourceforge.net/svnroot/videoprocessing/trunk/videoprocessing

Launching Visual Studio

Then navigate to the VPP_ROOT\Projects\Win32\Launch directory in explorer. In this directory there is a batch file named RTVC.bat. This is the file that Visual Studio has to be launched via. The reason for this was primarily to be able to install the VPP on various machines with various versions of Visual Studio, the Windows SDK, and different Windows OSs and to get going quickly by making a few simple changes in the environment using these batch files. On executing the batchfile, the VPP Visual Studio solution will be opened. Note that if you are working on Windows 7/Vista, you need to execute the RTVC.bat file via a console that has been started as administrator if you want to register the DirectShow filters from inside Visual Studio. If you do not start as administrator, you will see the following errors on building the solution:

Optionally configuring VPP for your environment

Typically no modification of the batch files is necessary. However this depends on the specific environment (VS and Windows SDK installation drives, Windows version, etc.). At the very least, you might have to select the version of Visual Studio.

Selecting the Visual Studio version

In VsVersion.bat once can change the VS version by editing the VS_VERSION variable.

 VS_VERSION=VC10

Change this to the Visual Studio version you are intending to use. The default is VS2010.

Using Visual Studio Express

Additionally, the project can be built using the express version of Visual Studio by setting

 VC_EXE=VCExpress.exe

Windows SDK detection

In User.bat the Windows SDK version is detected. This can be overridden manually by setting DSHOWBASECLASSES and WindowsSDKDir.

Building the VPP

If everything is configured correctly, Visual Studio should be launched and the solution can be built by hitting F7.

Tuesday, June 12, 2012

Using the VPP H.264 DirectShow filter

In this post, we take a quick look at how the VPP H.264 filter can be used using GraphEdit. The H.264 filter accepts both the RGB24 as well as the I420 media types making it compatible with the VPP YUV source filter. The filter can be configured via the property page. Currently, the property page contains the options shown in Figure 1. More options will be added in the future.

Figure 1

At the bottom of the property page, one can tick a checkbox in order to use the standard Microsoft H.264 decoder that comes stock with Windows 7. If the box is unchecked, the VPP decoder will be used, which has a custom media type. Figure 2 shows a graph in which the stock MS H.264 decoder is used to decode the video.

Figure 2

Currently, there are two Modes of Operation that can be selected, mode 0 and mode 1. In mode 0, the Quality of the video can be configured. Valid values lie in the range [0 - 51] with 0 having the best quality and 51 the worst. Figure 2 shows a graph in which the first encoder has Quality 0 and the second encoder Quality 35.

Figure 2

Alternatively, the frame bit limit can be set by setting Mode of Operation equal to 1 which effectively limits the bitrate of the media stream. The frame bit limit is, as the name suggests, measured in bits per frame. That means that to achieve a rate of 128 kb/s with a source video of 10 frames a second, one would need to set the frame bit limit to 12,8 * 1024 = 13107 bits per frame. In a live media pipeline, the VPP Framerate estimator filter may be useful to measure the approximate framerate of the video source.

The other option of interest is the I-frame Period which creates periodic IDR-frames i.e. every n-th frame, will be encoded as an IDR frame.

All parameters are programmatically settable by using the COM ISettingsInterface which all all/most of the VPP DirectShow filters inherit, but that is a post for another day...

P.S. The Notify on I-frame and Prepend parameter sets are no longer used and will be removed in the next release of the software.

Tuesday, May 22, 2012

Cleaning up after VS projects builds

Visual Studio seems to create large amounts of temporary files which over time occupy gigs of harddrive space.
Here's a little python script that removes many of these files recursively.

 import os, re  
 ext_list = ["obj", "idb", "manifest", "pdb", "ncb", "suo", "pch", "pchi", "sdf", "embed.manifest", "intermediate.manifest", "embed.manifest.res", "res", "dep"]      
 total_size_bytes = 0  
 for dirname, dirnames, filenames in os.walk('.'):  
   for filename in filenames:  
         for ext in ext_list:  
             expr = "^[a-zA-Z0-9\s]+." + ext + "$"   
             matcher = re.compile(expr)  
             if matcher.match(filename):  
                 statinfo = os.stat(os.path.join(dirname, filename))  
                 total_size_bytes += statinfo.st_size  
                 print "Deleting: " + os.path.join(dirname, filename) + " " + str(statinfo.st_size) + " Bytes"  
                 os.remove(os.path.join(dirname, filename))  
 print "Total: " + str(total_size_bytes/1024) + " KB (" + str(total_size_bytes) + " Bytes)"  
 print "Total: " + str(total_size_bytes/1024/1024) + " MB (" + str(total_size_bytes/1024/1024/1024) + " GB)"

Pipe the python script to a text file to keep track of which files are deleted.

 cleanup.py > cleanup.txt

Adapt the file extensions as required.

* Code formatted by http://codeformatter.blogspot.de/

Wednesday, March 14, 2012

The Visual Studio 2011 Developer Preview

Today I thought I'd give the new VS 2011 Developer Preview a bash. My main interest was to see how C++ AMP can be used in increasing code performance.

This time the migration of projects to the latest VS version was relatively painless in comparison with the upgrade to VS2010. The one error I ran into was

Cannot open include file: 'sal.h': No such file or directory

which was easily fixed by adding "C:\Program Files (x86)\Windows Kits\8.0\Include\shared" to the list of include directories.

Remembering how the performance of an H.263 codec improved *notably* when we migrated from VS2008 to 2010, I was curious to find out what the VC team has done in 2011. After having rebuilt the Video Processing Project solution and running the FrameGrabber application the following results were observed:

Mode: Total average Per frame Improvement %
0 872.10 ms   0.86 ms
1 859.45 ms    0.75ms 13.42%
2 816.97ms 0.56ms        35.26%
3 879.01ms    0.82ms           5.04%
5   N/A

Comparing these results to the ones obtained using VS2010 as posted in Improving live multimedia pipeline performance

Mode: Total average Per frame Improvement %
0 1224.46ms 1.16ms
1 1105.63ms 1.02ms 12.26%
2 969.82ms 0.55ms 53.10%
3 1572.18ms 1.24ms -6.74%
5 1106.09ms 0.59ms 49.13%

The results were obtained using the FrameGrabber application, running the application 5 times and taking the average. Looks like the compiler team has done some serious work optimising the generated code. Even though the relative placement between the timings of the various modes is still similar, the gap has closed considerably. This is not to say, that optimisation is any less important when using newer compilers : a 35.26% improvement is nothing to frown upon.

Looks like the free lunch isn't quite over yet, as long as you can afford a new compiler/IDE :-)

UPDATE:

Looks like this might have something to do with the auto-vectorizer in VS2011: http://channel9.msdn.com/Shows/C9-GoingNative/GoingNative-7-VC11-Auto-Vectorizer-C-NOW-LangNEXT

Improving live multimedia pipeline performance

In this post, we will discuss code optimisation techniques necessary in real-time media pipelines. Live video requires that the media is processed fast enough to achieve the desired framerate e.g. to have a 15 fps framerate means that each frame should take no longer than 1000/15 = 66.6 ms.

A video pipeline is typically comprised of a media source, colour converters, scalers, croppers, video mixers, video codecs and media sinks.

Media pipeline

This means that all the operations together can not take more than 66.6 ms.
Although operations such as colour conversion are considered light-weight in relation to the video encoding, each link in the chain should be written as efficient as possible, within reasonable means. (Shaving a ms of colour conversion is not really going to make much difference if it is the encoder that takes 50ms per frame.)

The question is how we can improve algorithm performance?

Using the fixed point arithmetic over floating point
Less copies
Lookup tables over computation
Multi-threading
Increase cache hits
Using processor-specific knowledge (e.g. SIMD)
Using GPUs?
Improved algorithms structure(the big picture)

There is usually some kind of trade-off between speed and memory usage. In the case of the look-up table approach, there could be a slight computational overhead on start-up to compute the look-up table, with the benefit of less computations once the application is in a steady state. One should also take factors such as the size of the look-up table, and the target environment (i.e. desktop vs. embedded device) into account.

In this post, we will try out various techniques to improve the performance of the RGB to YUV420 colour converter ( source code available at the Video Processing Project).
The FrameGrabber project builds a simple multimedia pipeline consisting of a source, a sample grabber and a video renderer.

Once the sample grabber callback is triggered, we do the following:
- convert from RGB to YUV420
- convert back to RGB
- render image for visual confirmation that the conversion is correct.

The original color conversion code looks as follows can be seen in RealRGB24toYUV420Converter.cpp.

We will try to improve on this by adding a lookup table to minimize the multiplications as can be seen in FastLookupTableRGB24toYUV420Converter.cpp.

Next, we approach the problem using fixed point arithmetic as can be seen in FastFixedPointRGB24toYUV420Converter.cpp.
Here the idea is to use integer arithmetic over floating point.

Finally we attempted to use SIMD instructions to improve the colour converter performance as can be seen in FastSimdRGB24toYUV420Converter.cpp.

The FrameGrabber application is called with the following parameters:
FrameGrabber <<File>>.avi mode=0
where
mode 0 = original algorithm
mode 1 = lookup table
mode 2 = fixed-point arithmetic
mode 3 = SIMD
mode 4 = GPU (unimplemented)
mode 5 = multi-threaded

The standard Foreman test video sequence with CIF resolution was used as the video source in this experiment. The application was run 5 times per mode using an automated script and the results were averaged.

Mode: Total average Per frame Improvement %
0 1224.46ms 1.16ms
1 1105.63ms 1.02ms 12.26%
2 969.82ms 0.55ms 53.10%
3 1572.18ms 1.24ms -6.74%
5 1106.09ms 0.59ms 49.13%

As expected using a look-up table yields a notable improvement on the original algorithm. The fixed-point arithmetic performs best of all and is roughly twice as fast as the original algorithm. Surprisingly, the SIMD approach yielded no improvements, in fact performs slightly worse than the original. This could however be an implementation issue. (If you have a better solution, please drop us a line). FYI, the question was posted on stackoverflow. The multi-threaded approach yields also yields a performance gain though this approach should be taken with caution. I would not advise spawning additional threads for the purpose of optimising colour conversion.

Comments/criticism/suggestions/improvements? Please drop us a line. Feel free to download the source and give it a try.

Note:
In order to compile the solution with support for mode 5, USE_MULTI_THREADED must be defined in the Image and FrameGrabber projects. Additionally, boost::thread and boost::asio are used to scale the colour conversion across 2 processors and the relevant boost include and library paths need to be configured in Visual Studio.

Wednesday, March 7, 2012

H.264 implementation update

The H.264 implementation we have been working on is finally nearing completion and will be added to the Video Processing Project in the near future. The author of the H.264 codec wrote the following explanation regarding the usage of the codec:

Implementing a DirectShow H264 source filter

After not finding a suitable DirectShow source filter able to render raw H.264 files, we decided to roll our own one (available at the Video Processing Project. I'm sure that many developers have written one of these and perhaps it's time to stop reinventing the wheel. Should anyone want/like to contribute improvements/extensions to this filter, please drop us a line.

The H.264 reference software allows one to take a YUV file and encode it into a .264 file format. These .264 files consist of a sequence of NAL units, each prepended with a start code (0x00000001). A source filter would thus have to read one of these files, break it up into separate NAL units, and then pass one frame at a time to the decoder. Windows 7 features a built-in H.264 decoder.

The IFileSourceFilter interface is implemented to facilitate loading of .264 files. This causes GraphEdit/GraphStudio to display a dialog box in which one can select the desired .264 file.

One of the first steps in writing a source filter is to provide the correct output pin media type, that allows the DirectShow framework to render the graph. In this case, the MEDIA_SUBTYPE_H264 was selected since using it requires the least amount of effort. The implementation of GetMediaType looks as follows:

HRESULT H264OutputPin::GetMediaType(CMediaType *pMediaType)
{
CAutoLock cAutoLock(m_pFilter->pStateLock());
CheckPointer(pMediaType, E_POINTER);

pMediaType->InitMediaType();
pMediaType->SetType(&MEDIATYPE_Video);
pMediaType->SetSubtype(&MEDIASUBTYPE_H264);
pMediaType->SetFormatType(&FORMAT_VideoInfo2);
VIDEOINFOHEADER2* pvi2 = (VIDEOINFOHEADER2*)pMediaType->AllocFormatBuffer(
sizeof(VIDEOINFOHEADER2));
ZeroMemory(pvi2, sizeof(VIDEOINFOHEADER2));
pvi2->bmiHeader.biBitCount = 24;
pvi2->bmiHeader.biSize = 40;
pvi2->bmiHeader.biPlanes = 1;
pvi2->bmiHeader.biWidth = m_pFilter->m_iWidth;
pvi2->bmiHeader.biHeight = m_pFilter->m_iHeight;
pvi2->bmiHeader.biSize = m_pFilter->m_iWidth * m_pFilter->m_iHeight * 3;
pvi2->bmiHeader.biSizeImage = DIBSIZE(pvi2->bmiHeader);
pvi2->bmiHeader.biCompression = DWORD('1cva');
const REFERENCE_TIME FPS_25 = UNITS / 25;
pvi2->AvgTimePerFrame = FPS_25;
SetRect(&pvi2->rcSource, 0, 0, m_pFilter->m_iWidth, m_pFilter->m_iHeight);
pvi2->rcTarget = pvi2->rcSource;
pvi2->dwPictAspectRatioX = m_pFilter->m_iWidth;
pvi2->dwPictAspectRatioY = m_pFilter->m_iHeight;
return S_OK;
}

This code is sufficient to allow DirectShow to insert the Windows H.264 decoder into the pipeline. Here the width and height seem to be of little importance since they are in any case communicated in the (H.264) Sequence Parameter Set. The parameter sets are found by scanning the .264 file for the appropriate NAL units. The NAL unit type can be extracted from the NAL unit header as follows:

unsigned char uiNalUnitType = nalUnitHeader & 0x1f;

Sequence parameter sets have value 7, picture parameter sets 8, and IDR frames value 5. One typically needs to pass these to the decoder before other encoded frames.

In closing, the filter currently is also able to output a custom media type, namely MEDIASUBTYPE_H264M. This makes it easy to test our own H.264 decoder filter. In the property pages of the source filter, one can select what the output media type of the H.264 source filter should be. It should be noted, that our H.264 decoder has limitations regarding the implemented parts of the specification as described in this post.

Should you wish to to be able to drag and drop .264 files into GraphStudio, run the registry scripts in the videoprocessing\Projects\Win32\Launch directory. Unfortunately these have only been tested on Windows 7.

Improvements/suggestions/corrections are of course welcome!

Tuesday, March 6, 2012

Introducing a DirectShow YUV source filter

This post introduces a YUV source filter that can be used to load standard YUV test sequences into the DirectShow environment. Certain YUV test sequences (such as the Foreman sequence pictured on the left) have become widely used by researchers and developers active in the video coding field and many of them are currently available here.

The YUV source filter is part of the Video Processing Project and the source can be downloaded and reused under a BSD license.

Once registered with the OS (regsvr32), the YUV source filter appears under the DirectShow filters category in GraphStudio.

Once inserted into the graph, the user must select the YUV file in the IFileSourceFilter dialog:

The output pin of the YUV source filter can then be rendered.

Since the YUV file format of the test sequence videos contains no information regarding image dimensions and framerate, these must be manually configured using the property page of the filter. In the case where these are not configured correctly, the application may of course crash.

An attempt has been made to auto-configure the filter using a naming convention: If the filename contains CIF, QCIF, or a string of the form <<width>>x<<height>>, the dimensions of the filter are automatically configured. This approach may be refined/improved at a later stage if required.

Color conversion in DirectShow

In this post we will look at various aspects of writing DirectShow color conversion filters. In our Video Processing Project we released a set of filters that convert between the RGB24 to YUV420 Planar color formats. These filters were originally written to convert RGB to YUV for the purpose of encoding video as H.263.

During a later stage of the project, we undertook some compliance testing to make sure that the converted YUV format is compatible with the MS YUV formats i.e. the filters would be regarded as compliant if the YUV to RGB conversion filter is interchangeable with the standard AVI decompressor that is usually inserted by DirectShow. Further, our filters needed to be able to convert the standard test sequence videos from YUV420 to RGB. In particular, our RGB to YUV converter outputs MEDIASUBTYPE_I420 (which is the format that the test sequence videos use).

Getting the filters to be compatible with with both the MS formats and the test sequence formats required us to add a chrominance offset parameter in the filter, that allows the mapping into the unsigned short range [0, 255].

Further, we noticed that we needed to flip the image during color conversion.

As a final compatibility test, the video was dumped to file:

Then, using the YUV source filter (also available in the video processing project) we rendered the following graph:

Once the AVI decompressor was able to output our YUV format properly, the filters were regarded as compatible:

As an aside note, the color conversion filters allow one to configure the chrominance offset and the flipping of the image via their property pages. Both these properties can be changed while the graph is running.

Video processing project blog