Porting EBU R128 audio loudness analysis from C to Rust

Over the last few weeks I ported the libebur128 C library to Rust, both with a proper Rust API as well as a 100% compatible C API.

This blog post will be split into 4 parts that will be published over the next weeks

  1. Overview and motivation
  2. Porting approach with various details, examples and problems I ran into along the way
  3. Performance optimizations
  4. Building Rust code into a C library as drop-in replacement

If you’re only interested in the code, that can be found on GitHub and in the ebur128 crate on crates.io.

The initial versions of the ebur128 crate was built around the libebur128 C library (and included its code for ease of building), version 0.1.2 and newer is the pure Rust implementation.

EBU R128

libebur128 implements the EBU R128 loudness standard. The Wikipedia page gives a good summary of the standard, but in short it describes how to measure loudness of an audio signal and how to use this for loudness normalization.

While this intuitively doesn’t sound very complicated, there are lots of little details (like how human ears are actually working) that make this not as easy as one might expect. This results in there being many different ways for measuring loudness and is one of the reasons why this standard was introduced. Of course it is also not the only standard for this.

libebur128 is also the library that I used in the GStreamer loudness normalization plugin, about which I wrote a few weeks ago already. By porting the underlying loudness measurement code to Rust, the only remaining C dependency of that plugin is GStreamer itself.

Apart from that it is used by FFmpeg, but they include their own modified copy, as well as many other projects that need some kind of loudness measurement and don’t use ReplayGain, another older but widely used standard for the same problem.

Why?

Before going over the details of what I did, let me first explain why I did this work at all. libebur128 is a perfectly well working library, in wide use for a long time and probably rather bug-free at this point and it was already possible to use the C implementation from Rust just fine. That’s what the initial versions of the ebur128 crate were doing.

My main reason for doing this was simply because it seemed like a fun little project. It isn’t a lot of code that is changing often so once ported it should be more or less finished and it shouldn’t be much work to stay in sync with the C version. I started thinking about doing this already after the initial release of the C-based ebur128 release, but after reading Joe Neeman’s blog post about porting another C audio library (RNNoise) to Rust this gave me the final push to actually start with porting the code and to follow through until it’s done.

However, don’t go around and ask other people to rewrite their projects in Rust (don’t be rude) or think that your own rewrite is magically going to be much faster and less buggy than the existing implementation. While Rust saves you from a big class of possible bugs, it doesn’t save you from yourself and usually rewrites contain bugs that didn’t exist in the original implementation. Also getting good performance in Rust requires, like in every other language, some effort. Before rewriting any software, think about the goals of this rewrite realistically as well as the effort required to actually get it finished.

Apart from fun there were also a few technical and non-technical reasons for me to look into this. I’m going to just list two here (curiosity and portability). I will skip the usual Rust memory-safety argument as that seems less important with this code: the C code is widely used for a long time, not changing a lot and has easy to follow memory access patterns. While it definitely had a memory safety bug (see above), it was rather difficult to trigger and it was fixed in the meantime.

Curiosity

Personally and at my company Centricular we try to do any new projects where it makes sense in Rust. While this worked very well in the past and we got great results, there were some questions for future projects that I wanted to get some answers, hard data and personal experience for

  • How difficult is it to port a C codebase function by function to Rust while keeping everything working along the way?
  • How difficult is it to get the same or better performance with idiomatic Rust code for low-level media processing code?
  • How much bigger or smaller is the resulting code and do Rust’s higher-level concepts like iterators help to keep code concise?
  • How difficult is it to create a C-compatible library in Rust with the same API and ABI?

I have some answers to all these questions already but previous work on this was not well structured and the results were also not documented, which I’m trying to change here now. Both to have a reference for myself in the future as well as for convincing other people that Rust is a reasonable technology choice for such projects.

As you can see the general pattern of these questions are introducing Rust into an existing codebase, replacing existing components with Rust and writing new components in Rust, which is also relates to my work on the Rust GStreamer bindings.

Portability

C is a very old language and while there is a standard, each compiler has its own quirks and each platform different APIs on top of the bare minimum that the C standard defines. C itself is very portable, but it is not easy to write portable C code, especially when not using a library like GLib that hides these differences and provides basic data structures and algorithms.

This seems to be something that is often forgotten when the portability of C is given as an argument against Rust, and that’s the reason why I wanted to mention this here specifically. While you can get a C compiler basically everywhere, writing C code that also runs well everywhere is another story and C doesn’t make this easy by design. Rust on the other hand makes writing portable code quite easy in my experience.

In practice there were three specific issues I had for this codebase. Most of the advantages of Rust here are because it is a new language and doesn’t have to carry a lot of historical baggage.

Mathematical Constants and Functions

Mathematical constants are not actually part of any C standard. While most compilers just define M_PI (for π), M_E (for 𝖾) and others in math.h nonetheless as they’re defined by POSIX and UNIX98.

Microsoft’s MSVC doesn’t, but instead you have to #define _USE_MATH_DEFINES before including math.h.

While not a big problem per-se, it is annoying and indeed caused the initial version of the ebur128 Rust crate to not compile with MSVC because I forgot about it.

Similarly, which mathematical functions are available depends a lot on the target platform and which version of the C standard is supported. An example of this is the log10 function to calculate the base-10 logarithm. For portability reasons, libebur128 didn’t use it but instead calculated it via the natural logarithm (ln(x) / ln(10) = log10(x)) because it’s only available in POSIX and since C99. While C99 is from 1999, there are still many compilers out there that don’t fully support it, again most prominently MSVC until very recently.

Using log10 instead of going via the natural logarithm is faster and more precise due to floating point number reasons, which is why the Rust implementation uses it but in C it would be required to check at build-time if the function is available or not, which complicates the build process and can easily be forgotten. libebur128 decided to not bother with these complications and simply not use it. Because of that, some conditional code in the Rust implementation is necessary for ensuring that both implementations return the same results in the tests.

Data Structures

libebur128 uses a linked-list-based queue data structure. As the C standard library is very minimal, no collection data structures are included. However on the BSDs and also on Linux with the GNU C library there is one available in sys/queue.h.

Of course MSVC does not have this and other compilers/platforms probably won’t have it either, so libebur128 included a local copy of that queue implementation. Now when building, one has to decide whether there is a system implementation available or otherwise use the internal version. Or simply always use the internal version.

Copying implementations of basic data structures and algorithms into every single project is ugly and error-prone, so let’s maybe not do that. C not having a standardized mechanism for dependency handling doesn’t help with this, which is unfortunately why this is very common in C projects.

One-time Initialization

Thread-safe one-time initialization is another thing that is not defined by the C standard, and depending on your platform there are different APIs available for it or none at all. POSIX again defines one that is widely available, but you can’t really depend on it unconditionally.

This complicates the code and build procedure, so libebur128 simply did not do that and did its one-time initializations of some global arrays every time a new instance was created. Which is probably fine, but a bit wasteful and probably strictly-speaking according to the C standard not actually thread-safe.

The initial version of the ebur128 Rust crate side-stepped this problem by simply doing this initialization once with the API provided by the Rust standard library. See part 2 and part 3 of this blog post for some more details about this.

Easier to Compile and Integrate

A Rust port only requires a Rust compiler, a mixed C/Rust codebase requires at least a C compiler in addition and some kind of build system for the C code.

libebur128 uses CMake, which would be an additional dependency so in the initial version of the ebur128 crate I went via cargo‘s build.rs build scripts and the cc crate as building libebur128 is easy enough. This works but build scripts are problematic for integration of the Rust code into other build systems than cargo.

The Rust port also makes use of conditional compilation in various places. Unlike in C with the preprocessor, non-standardized and inconsistent platform #defines and it being necessary to integrate everything in a custom way into the build system, Rust has a principled and well-designed approach to this problem. This makes it easier to keep the code clean, easier to maintain and more portable.

In addition to build system related simplifications, by not having any C code it is also much easier to compile the code to other targets like WebAssembly, which is natively supported by Rust. It is also possible to compile C to WebAssembly but getting both toolchains to agree with each other and produce compatible code seems not very easy.

Overview

As mentioned above, the code can be found on GitHub and in the ebur128 crate on crates.io.

The current version of the code produces the exact same results as the C version. This is enforced by the quickcheck tests that are running randomized inputs through both versions and check that the results are the same. The code also succeeds all the tests in the EBU loudness test set, so should hopefully be standards compliant as long as the test implementation is not wrong.

Performance-wise the Rust implementation is at least as fast as the C implementation. In some configurations it’s a few percent faster but probably not enough that it actually matters in practice. There are various benchmarks for both versions in different configurations available. The benchmarks are based on the criterion crate, which uses statistical methods to give as accurate as possible results. criterion also generates nice results with graphs for making analysis of the results more pleasant. See part 3 of this blog post for more details.

Writing tests and benchmarks for Rust is so much easier and feels more natural then doing it in C, so the Rust implementation has quite good coverage of the different code paths now. Especially no struggling with build systems was necessary like it would have been in C thanks to cargo and Rust having built-in support. This alone seems to have the potential to cause Rust code having, on average, better quality than similar code written in C.

It is also possible to compile the Rust implementation into a C library with the great cargo-c tool. This easily builds the code as a static/dynamic C library and installs the library, a C header file and also a pkg-config file. With this the Rust implementation is a 100% drop-in replacement of the C libebur128. It is not even necessary to recompile existing code. See part 4 of this blog post for more details.

Dependencies

Apart from the Rust standard library the Rust implementation depends on two other, small and widely used crates. Unlike with C, depending on external dependencies is rather simple with Rust and cargo. The two crates in question are

  • smallvec for a dynamically sized vectors/arrays that can be stored on the stack up to a certain size and only then fall back to heap allocations. This allows to avoid a couple of heap allocations under normal usage.
  • bitflags, which provides a macro for implementing properly typed bitflags. This is used in the constructor of the main type for selecting the features and modes that should be enabled, which directly maps to how the C API works (just with less type-safety).

Unsafe Code

A common question when announcing a Rust port of some C library is how much unsafe code was necessary to reach the same performance as the C code. In this case there are two uses of unsafe code outside the FFI code to call the C implementation in the tests/benchmarks and the C API.

Resampler

The True Peak measurement is using a resampler to upsample the audio signal to a higher sample rate. As part of the most inner loop of the resampler a statically sized ringbuffer is used.

As part of that ringbuffer, explicit indexing of a slice is needed. While the indexes are already manually checked to wrap around when needed, the Rust compiler and LLVM can’t figure that out so additional bounds checks plus panic handling is present in the compiled code. Apart from slowing down the loop with the additional condition, the panic code also causes the whole loop to be optimized less well.

So to get around that, unsafe indexing into the slice is used for performance reasons. While it requires a human now to check the memory safety of the code instead of relying on the compiler, the code in question is simple and small enough that it shouldn’t be a problem in practice.

More on this in part 2 and part 3 of this blog post.

Flushing Denormals to Zero

The other use of unsafe code is in the filter that is applied to the incoming audio signal. On x86/x86-64 the MXCSR register temporarily gets the _MM_FLUSH_ZERO_ON bit set to flush denormal floating point number to zero. That is, denormals (i.e. very small numbers close to zero) as result of any floating point operation are considered as zero.

This happens both for performance reasons as well as correctness reasons. Operations on denormals are generally much slower than on normalized floating point numbers. This has a measurable impact on the performance in this case.

Also as the C library does the same and not flushing denormals to zero would lead to slightly different results. While this difference doesn’t matter in practice as it’s very very small, it would make it harder to compare the results of both implementations as they wouldn’t be as close to each other anymore.

Doing this affects every floating point operation that happens while that bit is set, but because these are only the floating point operations performed by this crate and it’s guaranteed that the bit is unset again (even in case of panics) before leaving the filter, this shouldn’t cause any problems for other code.

Additional Features

Once the C library was ported and performance was comparable to the C implementation, I shortly checked the issues reported on the C library to check if there’s any useful feature requests or bug reports that I could implement / fix in the Rust implementation. There were three, one of which I also wanted for a future project.

None of the new features are available via the C API at this point for compatibility reasons.

Resetting the State

For this one there was a PR already for the C library. Previously the only way to reset all measurements was to create a new instance, which involves new memory allocations, filter initialization, etc..

It’s easy enough to provide a reset method to do only the minimal work required to reset all measurements and restart with a fresh state so I’ve added that to the Rust implementation.

Fix set_max_window() to actually work

This was a bug introduced in the C implementation a while ago in an attempt to prevent integer overflows when calculating sizes of memory allocations, which then would cause memory safety bugs because less memory was allocated than expected. Accidentally this fix restricted the allowed values for the maximum window size too much. There is a PR for fixing this in the C implementation.

On the Rust side this bug also existed because I simply ported over the checks. If I hadn’t ported over the checks, or ported an earlier version without the checks, there fortunately wouldn’t have been any memory safety bug on the Rust side though but instead one of two situations would have happened instead

  1. In debug builds integer overflows cause a panic, so instead of allocating less memory than expected during the setting of the parameters there would’ve been a panic immediately instead of invalid memory accesses later.
  2. In release builds integer overflows simply wrap around for performance reasons. This would’ve caused less memory than expected to be allocated, but later when trying to access the memory there would’ve been a panic when trying to access memory outside the allocated area.

While a panic is also not nice, it at least leads to no undefined behaviour and prevents worse things from happening.

The proper fix in this case was to not restrict the maximum window size statically but to instead check for overflows during the calculations. This is the same the PR for the C implementation does, but on the Rust side this is much easier because of built-in operations like checked_mul for doing an overflow-checking multiplication. In C this requires some rather convoluted code (check the PR for details).

Support for Planar Audio Input

The last additional feature that I implemented was support for planar audio input, for which also a PR to the C implementation exists already.

Most of the time audio signals have the samples of each channel interleaved with each other, so for example for stereo you have an array of samples with the first sample for the left channel, the first sample for the right channel, the second sample for the left channel, etc.. While this representation has some advantages, in other situations it is easier or faster to work with planar audio: the samples of each channel are contiguous one after another, so you have e.g. first all the samples of the left channel one after another and only then all samples of the right channel.

The PR for the C implementation does this with some code duplication of existing macro code (which can be prevented by making the macros more complicated), on the Rust side I implemented this without any code duplication by adding an internal abstraction for interleaved/planar audio and iterating over the samples and then working with that in normal, generic Rust code. This required some minor refactoring and code reorganization but in the end was rather painless. Note that most of the change is addition of new tests and moving some code around.

When looking at the Samples trait, the main part of this refactoring, one might wonder why I used closures instead of Rust iterators for iterating over the samples and the reason is unfortunately performance. More on this in part 3 of this blog post.

Next Part

In the next part of this blog post I will describe the porting approach in detail and also give various examples for how to port C code to idiomatic Rust, and some examples of problems I was running into.

Automatic retry on error and fallback stream handling for GStreamer sources

A very common problem in GStreamer, especially when working with live network streams, is that the source might just fail at some point. Your own network might have problems, the source of the stream might have problems, …

Without any special handling of such situations, the default behaviour in GStreamer is to simply report an error and let the application worry about handling it. The application might for example want to restart the stream, or it might simply want to show an error to the user, or it might want to show a fallback stream instead, telling the user that the stream is currently not available and then seamlessly switch back to the stream once it comes back.

Implementing all of the aforementioned is quite some effort, especially to do it in a robust way. To make it easier for applications I implemented a new plugin called fallbackswitch that contains two elements to automate this.

It is part of the GStreamer Rust plugins and also included in the recent 0.6.0 release, which can also be found on the Rust package (“crate”) repository crates.io.

Installation

For using the plugin you most likely first need to compile it yourself, unless you’re lucky enough that e.g. your Linux distribution includes it already.

Compiling it requires a Rust toolchain and GStreamer 1.14 or newer. The former you can get via rustup for example, if you don’t have it yet, the latter either from your Linux distribution or by using the macOS, Windows, etc binaries that are provided by the GStreamer project. Once that is done, compiling is mostly a matter of running cargo build in the utils/fallbackswitch directory and copying the resulting libgstfallbackswitch.so (or .dll or .dylib) into one of the GStreamer plugin directories, for example ~/.local/share/gstreamer-1.0/plugins.

fallbackswitch

The first of the two elements is fallbackswitch. It acts as a filter that can be placed into any kind of live stream. It consumes one main stream (which must be live) and outputs this stream as-is if everything works well. Based on the timeout property it detects if this main stream didn’t have any activity for the configured amount of time, or everything arrived too late for that long, and then seamlessly switches to a fallback stream. The fallback stream is the second input of the element and does not have to be live (but it can be).

Switching between main stream and fallback stream doesn’t only work for raw audio and video streams but also works for compressed formats. The element will take constraints like keyframes into account when switching, and if necessary/possible also request new keyframes from the sources.

For example to play the Sintel trailer over the network and displaying a test pattern if it doesn’t produce any data, the following pipeline can be constructed:

gst-launch-1.0 souphttpsrc location=https://www.freedesktop.org/software/gstreamer-sdk/data/media/sintel_trailer-480p.webm ! \
    decodebin ! identity sync=true ! fallbackswitch name=s ! videoconvert ! autovideosink \
    videotestsrc ! s.fallback_sink

Note the identity sync=true in the main stream here as we have to convert it to an actual live stream.

Now when running the above command and disconnecting from the network, the video should freeze at some point and after 5 seconds a test pattern should be displayed.

However, when using fallbackswitch the application will still have to take care of handling actual errors from the main source and possibly restarting it. Waiting a bit longer after disconnecting the network with the above command will report an error, which then stops the pipeline.

To make that part easier there is the second element.

fallbacksrc

The second element is fallbacksrc and as the name suggests it is an actual source element. When using it, the main source can be configured via an URI or by providing a custom source element. Internally it then takes care of buffering the source, converting non-live streams into live streams and restarting the source transparently on errors. The various timeouts for this can be configured via properties.

Different to fallbackswitch it also handles audio and video at the same time and demuxes/decodes the streams.

Currently the only fallback streams that can be configured are still images for video. For audio the element will always output silence for now, and if no fallback image is configured for video it outputs black instead. In the future I would like to add support for arbitrary fallback streams, which hopefully shouldn’t be too hard. The basic infrastructure for it is already there.

To use it again in our previous example and having a JPEG image displayed whenever the source does not produce any new data, the following can be done:

gst-launch-1.0 fallbacksrc uri=https://www.freedesktop.org/software/gstreamer-sdk/data/media/sintel_trailer-480p.webm \
    fallback-uri=file:///path/to/some/jpg ! videoconvert ! autovideosink

Now when disconnecting the network, after a while (longer than before because fallbacksrc does additional buffering for non-live network streams) the fallback image should be shown. Different to before, waiting longer will not lead to an error and reconnecting the network causes the video to reappear. However as this is not an actual live-stream, right now playback would again start from the beginning. Seeking back to the previous position would be another potential feature that could be added in the future.

Overall these two elements should make it easier for applications to handle errors in live network sources. While the two elements are still relatively minimal feature-wise, they should already be usable in various real scenarios and are already used in production.

As usual, if you run into any problems or are missing some features, please create an issue in the GStreamer bug tracker.

GStreamer Rust Bindings & Plugins New Releases

It has been quite a while since the last status update for the GStreamer Rust bindings and the GStreamer Rust plugins, so the new releases last week make for a good opportunity to do so now.

Bindings

I won’t write too much about the bindings this time. The latest version as of now is 0.16.1, which means that since I started working on the bindings there were 8 major releases. In that same time there were 45 contributors working on the bindings, which seems quite a lot and really makes me happy.

Just as before, I don’t think any major APIs are missing from the bindings anymore, even for implementing subclasses of the various GStreamer types. The wide usage of the bindings in Free Software projects and commercial products also shows both the interest in writing GStreamer applications and plugins in Rust as well as that the bindings are complete enough and production-ready.

Most of the changes since the last status update involve API cleanups, usability improvements, various bugfixes and addition of minor API that was not included before. The details of all changes can be read in the changelog.

The bindings work with any GStreamer version since 1.8 (released more than 4 years ago), support APIs up to GStreamer 1.18 (to be released soon) and work with Rust 1.40 or newer.

Plugins

The biggest progress probably happened with the GStreamer Rust plugins.

There also was a new release last week, 0.6.0, which was the first release where selected plugins were also uploaded to the Rust package (“crate”) database crates.io. This makes it easy for Rust applications to embed any of these plugins statically instead of depending on them to be available on the system.

Overall there are now 40 GStreamer elements in 18 plugins by 28 contributors available as part of the gst-plugins-rs repository, one tutorial plugin with 4 elements and various plugins in external locations.

These 40 GStreamer elements are the following:

Audio
  • rsaudioecho: Port of the audioecho element from gst-plugins-good
  • rsaudioloudnorm: Live audio loudness normalization element based on the FFmpeg af_loudnorm filter
  • claxondec: FLAC lossless audio codec decoder element based on the pure-Rust claxon implementation
  • csoundfilter: Audio filter that can use any filter defined via the Csound audio programming language
  • lewtondec: Vorbis audio decoder element based on the pure-Rust lewton implementation
Video
  • cdgdec/cdgparse: Decoder and parser for the CD+G video codec based on a pure-Rust CD+G implementation, used for example by karaoke CDs
  • cea608overlay: CEA-608 Closed Captions overlay element
  • cea608tott: CEA-608 Closed Captions to timed-text (e.g. VTT or SRT subtitles) converter
  • tttocea608: CEA-608 Closed Captions from timed-text converter
  • mccenc/mccparse: MacCaption Closed Caption format encoder and parser
  • sccenc/sccparse: Scenarist Closed Caption format encoder and parser
  • dav1dec: AV1 video decoder based on the dav1d decoder implementation by the VLC project
  • rav1enc: AV1 video encoder based on the fast and pure-Rust rav1e encoder implementation
  • rsflvdemux: Alternative to the flvdemux FLV demuxer element from gst-plugins-good, not feature-equivalent yet
  • rsgifenc/rspngenc: GIF/PNG encoder elements based on the pure-Rust implementations by the image-rs project
Text
  • textwrap: Element for line-wrapping timed text (e.g. subtitles) for better screen-fitting, including hyphenation support for some languages
Network
  • reqwesthttpsrc: HTTP(S) source element based on the Rust reqwest/hyper HTTP implementations and almost feature-equivalent with the main GStreamer HTTP source souphttpsrc
  • s3src/s3sink: Source/sink element for the Amazon S3 cloud storage
  • awstranscriber: Live audio to timed text transcription element using the Amazon AWS Transcribe API
Generic
  • sodiumencrypter/sodiumdecrypter: Encryption/decryption element based on libsodium/NaCl
  • togglerecord: Recording element that allows to pause/resume recordings easily and considers keyframe boundaries
  • fallbackswitch/fallbacksrc: Elements for handling potentially failing (network) sources, restarting them on errors/timeout and showing a fallback stream instead
  • threadshare: Set of elements that provide alternatives for various existing GStreamer elements but allow to share the streaming threads between each other to reduce the number of threads
  • rsfilesrc/rsfilesink: File source/sink elements as replacements for the existing filesrc/filesink elements

Live loudness normalization in GStreamer & experiences with porting a C audio filter to Rust

A few months ago I wrote a new GStreamer plugin: an audio filter for live loudness normalization and automatic gain control.

The plugin can be found as part of the GStreamer Rust plugin in the audiofx plugin. It’s also included in the recent 0.6.0 release of the GStreamer Rust plugins and available from crates.io.

Its code is based on Kyle Swanson’s great FFmpeg filter af_loudnorm, about which he wrote some more technical details on his blog a few years back. I’m not going to repeat all that here, if you’re interested in those details and further links please read Kyle’s blog post.

From a very high-level, the filter works by measuring the loudness of the input following the EBU R128 standard with a 3s lookahead, adjusts the gain to reach the target loudness and then applies a true peak limiter with 10ms to prevent any too high peaks to get passed through. Both the target loudness and the maximum peak can be configured via the loudness-target and max-true-peak properties, same as in the FFmpeg filter. Different to the FFmpeg filter I only implemented the “live” mode and not the two-pass mode that is implemented in FFmpeg, which first measures the loudness of the whole stream and then in a second pass adjusts it.

Below I’ll describe the usage of the filter in GStreamer a bit and also some information about the development process, and the porting of the C code to Rust.

Usage

For using the filter you most likely first need to compile it yourself, unless you’re lucky enough that e.g. your Linux distribution includes it already.

Compiling it requires a Rust toolchain and GStreamer 1.8 or newer. The former you can get via rustup for example, if you don’t have it yet, the latter either from your Linux distribution or by using the macOS, Windows, etc binaries that are provided by the GStreamer project. Once that is done, compiling is mostly a matter of running cargo build in the audio/audiofx directory and copying the resulting libgstrsaudiofx.so (or .dll or .dylib) into one of the GStreamer plugin directories, for example ~/.local/share/gstreamer-1.0/plugins.

After that boring part is done, you can use it for example as follows to run loudness normalization on the Sintel trailer:

gst-launch-1.0 playbin \
    uri=https://www.freedesktop.org/software/gstreamer-sdk/data/media/sintel_trailer-480p.webm \
    audio-filter="audioresample ! rsaudioloudnorm ! audioresample ! capsfilter caps=audio/x-raw,rate=48000"

As can be seen above, it is necessary to put audioresample elements around the filter. The reason for that is that the filter currently only works on 192kHz input. This is a simplification for now to make it easier inside the filter to detect true peaks. You would first upsample your audio to 192kHz and then, if needed, later downsample it again to your target sample rate (48kHz in the example above). See the link mentioned before for details about true peaks and why this is generally a good idea to do. In the future the resampling could be implemented internally and maybe optionally the filter could also work with “normal” peak detection on the non-upsampled input.

Apart from that caveat the filter element works like any other GStreamer audio filter and can be placed accordingly in any GStreamer pipeline.

If you run into any problems using the code or it doesn’t work well for your use-case, please create an issue in the GStreamer bugtracker.

The process

As I wrote above, the GStreamer plugin is part of the GStreamer Rust plugins so the first step was to port the FFmpeg C code to Rust. I expected that to be the biggest part of the work, but as writing Rust is simply so much more enjoyable than writing C and I would have to adjust big parts of the code to fit the GStreamer infrastructure anyway, I took this approach nonetheless. The alternative of working based on the C code and writing the plugin in C didn’t seem very appealing to me. In the end, as usual when developing in Rust, this also allowed me to be more confident about the robustness of the result and probably reduced the amount of time spent debugging. Surprisingly, the translation was actually not the biggest part of the work, but instead I had to debug a couple of issues that were already present in the original FFmpeg code and find solutions for them. But more on that later.

The first step for porting the code was to get an implementation of the EBU R128 loudness analysis. In FFmpeg they’re using a fork of the libebur128 C library. I checked if there was anything similar for Rust already, maybe even a pure-Rust implementation of it, but couldn’t find anything. As I didn’t want to write one myself or port the code of the libebur128 C library to Rust, I wrote safe Rust bindings for that library instead. The end result of that can be found on crates.io as an independent crate, in case someone else also needs it for other purposes at some point. The crate also includes the code of the C library, making it as easy as possible to build and include into other projects.

The next step was to actually port the FFmpeg C code to Rust. In the end that was a rather straightforward translation fortunately. The latest version of that code can be found here.

The biggest difference to the C code is the usage of Rust iterators and iterator combinators like zip and chunks_exact. In my opinion this makes the code quite a bit easier to read compared to the manual iteration in the C code together with array indexing, and as a side effect it should also make the code run faster in Rust as it allows to get rid of a lot of array bounds checks.

Apart from that, one part that was a bit inconvenient during that translation and still required manual array indexing is the usage of ringbuffers everywhere in the code. For now I wrote those like I would in C and used a few unsafe operations like get_unchecked to avoid redundant bounds checks, but at a later time I might refactor this into a proper ringbuffer abstraction for such audio processing use-cases. It’s not going to be the last time I need such a data structure. A short search on crates.io gave various results for ringbuffers but none of them seem to provide an API that fits the use-case here. Once that’s abstracted away into a nice data structure, I believe the Rust code of this filter is really nice to read and follow.

Now to the less pleasant parts, and also a small warning to all the people asking for Rust rewrites of everything: of course I introduced a couple of new bugs while translating the code although this was a rather straightforward translation and I tried to be very careful. I’m sure there is also still a bug or two left that I didn’t find while debugging. So always keep in mind that rewriting a project will also involve adding new bugs that didn’t exist in the original code. Or maybe you’re just a better programmer than me and don’t make such mistakes.

Debugging these issues that showed up while testing the code was a good opportunity to also add extensive code comments everywhere so I don’t have to remind myself every time again what this block of code is doing exactly, and it’s something I was missing a bit from the FFmpeg code (it doesn’t have a single comment currently). While writing those comments and explaining the code to myself, I found the majority of these bugs that I introduced and as a side-effect I now have documentation for my future self or other readers of the code.

Fixing these issues I introduced myself wasn’t that time-consuming neither in the end fortunately, but while writing those code comments and also while doing more testing on various audio streams, I found a couple of bugs that already existed in the original FFmpeg C code. Further testing also showed that they caused quite audible distortions on various test streams. These are the bugs that unfortunately took most of the time in the whole process, but at least to my knowledge there are no known bugs left in the code now.

For these bugs in the FFmpeg code I also provided a fix that is merged already, and reported the other two in their bug tracker.

The first one I’d be happy to provide a fix for if my approach is considered correct, but the second one I’ll leave for someone else. Porting over my Rust solution for that one will take some time and getting all the array indexing involved correct in C would require some serious focusing, for which I currently don’t have the time.

Or maybe my solutions to these problems are actually wrong, or my understanding of the original code was wrong and I actually introduced them in my translation, which also would be useful to know.

Overall, while porting the C code to Rust introduced a few new problems that had to be fixed, I would definitely do this again for similar projects in the future. It’s more fun to write and in my opinion the resulting code is easier readable, and better to maintain and extend.

The GTK Rust bindings are not ready yet? Yes they are!

When talking to various people at conferences in the last year or at conferences, a recurring topic was that they believed that the GTK Rust bindings are not ready for use yet.

I don’t know where that perception comes from but if it was true, there wouldn’t have been applications like Fractal, Podcasts or Shortwave using GTK from Rust, or I wouldn’t be able to do a workshop about desktop application development in Rust with GTK and GStreamer at the Linux Application Summit in Barcelona this Friday (code can be found here already) or earlier this year at GUADEC.

One reason I sometimes hear is that there is not support for creating subclasses of GTK types in Rust yet. While that was true, it is not true anymore nowadays. But even more important: unless you want to create your own special widgets, you don’t need that. Many examples and tutorials in other languages make use of inheritance/subclassing for the applications’ architecture, but that’s because it is the idiomatic pattern in those languages. However, in Rust other patterns are more idiomatic and even for those examples and tutorials in other languages it wouldn’t be the one and only option to design applications.

Almost everything is included in the bindings at this point, so seriously consider writing your next GTK UI application in Rust. While some minor features are still missing from the bindings, none of those should prevent you from successfully writing your application.

And if something is actually missing for your use-case or something is not working as expected, please let us know. We’d be happy to make your life easier!

P.S.

Some people are already experimenting with new UI development patterns on top of the GTK Rust bindings. So if you want to try developing an UI application but want to try something different than the usual signal/callback spaghetti code, also take a look at those.

MPSC Channel API for painless usage of threads with GTK in Rust

A very common question that comes up on IRC or elsewhere by people trying to use the gtk-rs GTK bindings in Rust is how to modify UI state, or more specifically GTK widgets, from another thread.

Due to GTK only allowing access to its UI state from the main thread and Rust actually enforcing this, unlike other languages, this is less trivial than one might expect. To make this as painless as possible, while also encouraging a more robust threading architecture based on message-passing instead of shared state, I’ve added some new API to the glib-rs bindings: An MPSC (multi-producer/single-consumer) channel very similar to (and based on) the one in the standard library but integrated with the GLib/GTK main loop.

While I’ll mostly write about this in the context of GTK here, this can also be useful in other cases when working with a GLib main loop/context from Rust to have a more structured means of communication between different threads than shared mutable state.

This will be part of the next release and you can find some example code making use of this at the very end. But first I’ll take this opportunity to also explain why it’s not so trivial in Rust first and also explain another solution.

Table of Contents

  1. The Problem
  2. One Solution: Safely working around the type system
  3. A better solution: Message passing via channels

The Problem

Let’s consider the example of an application that has to perform a complicated operation and would like to do this from another thread (as it should to not block the UI!) and in the end report back the result to the user. For demonstration purposes let’s take a thread that simply sleeps for a while and then wants to update a label in the UI with a new value.

Naively we might start with code like the following

let label = gtk::Label::new("not finished");
[...]
// Clone the label so we can also have it available in our thread.
// Note that this behaves like an Rc and only increases the
// reference count.
let label_clone = label.clone();
thread::spawn(move || {
    // Let's sleep for 10s
    thread::sleep(time::Duration::from_secs(10));

    label_clone.set_text("finished");
});

This does not compile and the compiler tells us (between a wall of text containing all the details) that the label simply can’t be sent safely between threads. Which is absolutely correct.

error[E0277]: {{EJS31}} cannot be sent between threads safely
  --> src/main.rs:28:5
   |
28 |     thread::spawn(move || {
   |     ^^^^^^^^^^^^^ {{EJS32}} cannot be sent between threads safely
   |
   = help: within {{EJS33}}, the trait {{EJS34}} is not implemented for {{EJS35}}
   = note: required because it appears within the type {{EJS36}}
   = note: required because it appears within the type {{EJS37}}
   = note: required because it appears within the type {{EJS38}}
   = note: required because it appears within the type {{EJS39}}
   = note: required by {{EJS40}}

In, e.g. C, this would not be a problem at all, the compiler does not know about GTK widgets and generally all GTK API to be only safely usable from the main thread, and would happily compile the above. It would the our (the programmer’s) job then to ensure that nothing is ever done with the widget from the other thread, other than passing it around. Among other things, it must also not be destroyed from that other thread (i.e. it must never have the last reference to it and then drop it).

One Solution: Safely working around the type system

So why don’t we do the same as we would do in C and simply pass around raw pointers to the label and do all the memory management ourselves? Well, that would defeat one of the purposes of using Rust and would require quite some unsafe code.

We can do better than that and work around Rust’s type system with regards to thread-safety and instead let the relevant checks (are we only ever using the label from the main thread?) be done at runtime instead. This allows for completely safe code, it might just panic at any time if we accidentally try to do something from wrong thread (like calling a function on it, or dropping it) and not just pass the label around.

The fragile crate provides a type called Fragile for exactly this purpose. It’s a wrapper type like Box, RefCell, Rc, etc. but it allows for any contained type to be safely sent between threads and on access does runtime checks if this is done correctly. In our example this would look like this

let label = gtk::Label::new("not finished");
[...]
// We wrap the label clone in the Fragile type here
// and move that into the new thread instead.
let label_clone = fragile::Fragile::new(label.clone());
thread::spawn(move || {
    // Let's sleep for 10s
    thread::sleep(time::Duration::from_secs(10));

    // To access the contained value, get() has
    // to be called and this is where the runtime
    // checks are happening
    label_clone.get().set_text("finished");
});

Not many changes to the code and it compiles… but at runtime we of course get a panic because we’re accessing the label from the wrong thread

thread '<unnamed>' panicked at 'trying to access wrapped value in fragile container from incorrect thread.', ~/.cargo/registry/src/github.com-1ecc6299db9ec823/fragile-0.3.0/src/fragile.rs:57:13

What we instead need to do here is to somehow defer the change of the label to the main thread, and GLib provides various API for doing exactly that. We’ll make use of the first one here but it’s mostly a matter of taste (and trait bounds: the former takes a FnOnce closure while the latter can be called multiple times and only takes FnMut because of that).

let label = gtk::Label::new("not finished");
[...]
// We wrap the label clone in the Fragile type here
// and move that into the new thread instead.
let label_clone = fragile::Fragile::new(label.clone());
thread::spawn(move || {
    // Let's sleep for 10s
    thread::sleep(time::Duration::from_secs(10));

    // Defer the label update to the main thread.
    // For this we get the default main context,
    // the one used by GTK on the main thread,
    // and use invoke() on it. The closure also
    // takes ownership of the label_clone and drops
    // it at the end. From the correct thread!
    glib::MainContext::default().invoke(move || {
        label_clone.get().set_text("finished");
    });
});

So far so good, this compiles and actually works too. But it feels kind of fragile, and that’s not only because of the name of the crate we use here. The label passed around in different threads is like a landmine only waiting to explode when we use it in the wrong way.

It’s also not very nice because now we conceptually share mutable state between different threads, which is the underlying cause for many thread-safety issues and generally increases complexity of the software considerable.

Let’s try to do better, Rust is all about fearless concurrency after all.

A better solution: Message passing via channels

As the title of this post probably made clear, the better solution is to use channels to do message passing. That’s also a pattern that is generally preferred in many other languages that focus a lot on concurrency, ranging from Erlang to Go, and is also the the recommended way of doing this according to the Rust Book.

So how would this look like? We first of all would have to create a Channel for communicating with our main thread.

As the main thread is running a GLib main loop with its corresponding main context (the loop is the thing that actually is… a loop, and the context is what keeps track of all potential event sources the loop has to handle), we can’t make use of the standard library’s MPSC channel. The Receiver blocks or we would have to poll in intervals, which is rather inefficient.

The futures MPSC channel doesn’t have this problem but requires a futures executor to run on the thread where we want to handle the messages. While the GLib main context also implements a futures executor and we could actually use it, this would pull in the futures crate and all its dependencies and might seem like too much if we only ever use it for message passing anyway. Otherwise, if you use futures also for other parts of your code, go ahead and use the futures MPSC channel instead. It basically works the same as what follows.

For creating a GLib main context channel, there are two functions available: glib::MainContext::channel() and glib::MainContext::sync_channel(). The latter takes a bound for the channel, after which sending to the Sender part will block until there is space in the channel again. Both are returning a tuple containing the Sender and Receiver for this channel, and especially the Sender is working exactly like the one from the standard library. It can be cloned, sent to different threads (as long as the message type of the channel can be) and provides basically the same API.

The Receiver works a bit different, and closer to the for_each() combinator on the futures Receiver. It provides an attach() function that attaches it to a specific main context, and takes a closure that is called from that main context whenever an item is available.

The other part that we need to define on our side then is how the messages should look like that we send through the channel. Usually some kind of enum with all the different kinds of messages you want to handle is a good choice, in our case it could also simply be () as we only have a single kind of message and without payload. But to make it more interesting, let’s add the new string of the label as payload to our messages.

This is how it could look like for example

enum Message {
    UpdateLabel(String),
}
[...]
let label = gtk::Label::new("not finished");
[...]
// Create a new sender/receiver pair with default priority
let (sender, receiver) = glib::MainContext::channel(glib::PRIORITY_DEFAULT);

// Spawn the thread and move the sender in there
thread::spawn(move || {
    thread::sleep(time::Duration::from_secs(10));

    // Sending fails if the receiver is closed
    let _ = sender.send(Message::UpdateLabel(String::from("finished")));
});

// Attach the receiver to the default main context (None)
// and on every message update the label accordingly.
let label_clone = label.clone();
receiver.attach(None, move |msg| {
    match msg {
        Message::UpdateLabel(text) => label_clone.set_text(text.as_str()),
    }

    // Returning false here would close the receiver
    // and have senders fail
    glib::Continue(true)
});

While this is a bit more code than the previous solution, it will also be more easy to maintain and generally allows for clearer code.

We keep all our GTK widgets inside the main thread now, threads only get access to a sender over which they can send messages to the main thread and the main thread handles these messages in whatever way it wants. There is no shared mutable state between the different threads here anymore, apart from the channel itself.

GStreamer Rust bindings 0.12 and GStreamer Plugin 0.3 release

After almost 6 months, a new release of the GStreamer Rust bindings and the GStreamer plugin writing infrastructure for Rust is out. As usual this was coinciding with the release of all the gtk-rs crates to make use of all the new features they contain.

Thanks to all the contributors of both gtk-rs and the GStreamer bindings for all the nice changes that happened over the last 6 months!

And as usual, if you find any bugs please report them and if you have any questions let me know.

GStreamer Bindings

For the full changelog check here.

Most changes this time were internally, especially because many user-facing changes (like Debug impls for various types) were already backported to the minor releases in the 0.11 release series.

WebRTC

The biggest change this time is probably the inclusion of bindings for the GStreamer WebRTC library.

This allows using building all kinds of WebRTC applications outside the browser (or providing a WebRTC implementation for a browser), and while not as full-featured as Google’s own implementation, this interoperates well with the various browsers and generally works much better on embedded devices.

A small example application in Rust is available here.

Serde

Optionally, serde trait implementations for the Serialize and Deserialize trait can be enabled for various fundamental GStreamer types, including caps, buffers, events, messages and tag lists. This allows serializing them into any format that can be handled by serde (which are many!), and deserializing them back to normal Rust structs.

Generic Tag API

Previously only a strongly-typed tag API was exposed that made it impossible to use the wrong data type for a specific tag, e.g. code that tries to store a string for the track number or an integer for the title would simply not compile:

let mut tags = gst::TagList::new();
{
    let tags = tags.get_mut().unwrap();
    tags.add::<Title>(&"some title", gst::TagMergeMode::Append);
    tags.add::<TrackNumber>(&12, gst::TagMergeMode::Append);
}

While this is convenient, it made it rather complicated to work with tag lists if you only wanted to handle them in a generic way. For example by iterating over the tag list and simply checking what kind of tags are available. To solve that, a new generic API was added in addition. This works on glib::Values, which can store any kind of type, and using the wrong type for a specific tag would simply cause an error at runtime instead of compile-time.

let mut tags = gst::TagList::new();
{
    let tags = tags.get_mut().unwrap();
    tags.add_generic(&gst::tags::TAG_TITLE, &"some title", gst::TagMergeMode::Append)
.expect("wrong type for title tag");
    tags.add_generic(&gst::tags::TAG_TRACK_NUMBER, &12, gst::TagMergeMode::Append)
.expect("wrong type for track number tag");
}

This also greatly simplified the serde serialization/deserialization for tag lists.

GStreamer Plugins

For the full changelog check here.

gobject-subclass

The main change this time is that all the generic GObject subclassing infrastructure was moved out of the gst-plugin crate and moved to its own gobject-subclass crate as part of the gtk-rs organization.

As part of this, some major refactoring has happened that allows subclassing more different types but also makes it simpler to add new types. There are also experimental crates for adding some subclassing support to gio and gtk, and a PR for autogenerating part of the code via the gir code generator.

More classes!

The other big addition this time is that it’s now possible to subclass GStreamer Pads and GhostPads, to implement the ChildProxy interface and to subclass the Aggregator and AggregatorPad class.

This now allows to write custom mixer/muxer-style elements (or generally elements that have multiple sink pads) in Rust via the Aggregator base class, and to have custom pad types for elements to allow for setting custom properties on the pads (e.g. to control the opacity of a single video mixer input).

There is currently no example for such an element, but I’ll add a very simple video mixer to the repository some time in the next weeks and will also write a blog post about it for explaining all the steps.

GLib/GIO async operations and Rust futures + async/await

Unfortunately I was not able to attend the Rust+GNOME hackfest in Madrid last week, but I could at least spend some of my work time at Centricular on implementing one of the things I wanted to work on during the hackfest. The other one, more closely related to the gnome-class work, will be the topic of a future blog post once I actually have something to show.

So back to the topic. With the latest GIT version of the Rust bindings for GLib, GTK, etc it is now possible to make use of the Rust futures infrastructure for GIO async operations and various other functions. This should make writing of GNOME, and in general GLib-using, applications in Rust quite a bit more convenient.

For the impatient, the summary is that you can use Rust futures with GLib and GIO now, that it works both on the stable and nightly version of the compiler, and with the nightly version of the compiler it is also possible to use async/await. An example with the latter can be found here, and an example just using futures without async/await here.

Table of Contents

  1. Futures
    1. Futures in Rust
    2. Async/Await
    3. Tokio
  2. Futures & GLib/GIO
    1. Callbacks
    2. GLib Futures
    3. GIO Asynchronous Operations
    4. Async/Await
  3. The Future

Futures

First of all, what are futures and how do they work in Rust. In a few words, a future (also called promise elsewhere) is a value that represents the result of an asynchronous operation, e.g. establishing a TCP connection. The operation itself (usually) runs in the background, and only once the operation is finished (or fails), the future resolves to the result of that operation. There are all kinds of ways to combine futures, e.g. to execute some other (potentially async) code with the result once the first operation has finished.

It’s a concept that is also widely used in various other programming languages (e.g. C#, JavaScript, Python, …) for asynchronous programming and can probably be considered a proven concept at this point.

Futures in Rust

In Rust, a future is basically an implementation of relatively simple trait called Future. The following is the definition as of now, but there are discussions to change/simplify/generalize it currently and to also move it to the Rust standard library:

pub trait Future {
    type Item;
    type Error;

    fn poll(&mut self, cx: &mut task::Context) -> Poll<Self::Item, Self::Error>;
}

Anything that implements this trait can be considered an asynchronous operation that resolves to either an Item or an Error. Consumers of the future would call the poll method to check if the future has resolved already (to a result or error), or if the future is not ready yet. In case of the latter, the future itself would at a later point, once it is ready to proceed, notify the consumer about that. It would get a way for notifications from the Context that is passed, and proceeding does not necessarily mean that the future will resolve after this but it could just advance its internal state closer to the final resolution.

Calling poll manually is kind of inconvenient, so generally this is handled by an Executor on which the futures are scheduled and which is running them until their resolution. Equally, it’s inconvenient to have to implement that trait directly so for most common operations there are combinators that can be used on futures to build new futures, usually via closures in one way or another. For example the following would run the passed closure with the successful result of the future, and then have it return another future (Ok(()) is converted via IntoFuture to the future that always resolves successfully with ()), and also maps any errors to ()

fn our_future() -> impl Future<Item = (), Err = ()> {
    some_future
        .and_then(|res| {
            do_something(res);
            Ok(())
        })
        .map_err(|_| ())
}

A future represents only a single value, but there is also a trait for something producing multiple values: a Stream. For more details, best to check the documentation.

Async/Await

The above way of combining futures via combinators and closures is still not too great, and is still close to callback hell. In other languages (e.g. C#, JavaScript, Python, …) this was solved by introducing new features to the language: async for declaring futures with normal code flow, and await for suspending execution transparently and resuming at that point in the code with the result of a future.

Of course this was also implemented in Rust. Currently based on procedural macros, but there are discussions to actually move this also directly into the language and standard library.

The above example would look something like the following with the current version of the macros

#[async]
fn our_future() -> Result<(), ()> {
    let res = await!(some_future)
        .map_err(|_| ())?;

    do_something(res);
    Ok(())
}

This looks almost like normal, synchronous code but is internally converted into a future and completely asynchronous.

Unfortunately this is currently only available on the nightly version of Rust until various bits and pieces get stabilized.

Tokio

Most of the time when people talk about futures in Rust, they implicitly also mean Tokio. Tokio is a pure Rust, cross-platform asynchronous IO library and based on the futures abstraction above. It provides a futures executor and various types for asynchronous IO, e.g. sockets and socket streams.

But while Tokio is a great library, we’re not going to use it here and instead implement a futures executor around GLib. And on top of that implement various futures, also around GLib’s sister library GIO, which is providing lots of API for synchronous and asynchronous IO.

Just like all IO operations in Tokio, all GLib/GIO asynchronous operations are dependent on running with their respective event loop (i.e. the futures executor) and while it’s possible to use both in the same process, each operation has to be scheduled on the correct one.

Futures & GLib/GIO

Asynchronous operations and generally everything event related (timeouts, …) are based on callbacks that you have to register, and are running via a GMainLoop that is executing events from a GMainContext. The latter is just something that stores everything that is scheduled and provides API for polling if something is ready to be executed now, while the former does exactly that: executing.

Callbacks

The callback based API is also available via the Rust bindings, and would for example look as follows

glib::timeout_add(20, || {
    do_something_after_20ms();
    glib::Continue(false) // don't call again
});

glib::idle_add(|| {
    do_something_from_the_main_loop();
    glib::Continue(false) // don't call again
});

some_async_operation(|res| {
    match res {
        Err(err) => report_error_somehow(),
        Ok(res) => {
            do_something_with_result(res);
            some_other_async_operation(|res| {
                do_something_with_other_result(res);
            });
        }
    }
});

As can be seen here already, the callback-based approach leads to quite non-linear code and deep indentation due to all the closures. Also error handling becomes quite tricky due to somehow having handle them from a completely different call stack.

Compared to C this is still far more convenient due to actually having closures that can capture their environment, but we can definitely do better in Rust.

The above code also assumes that somewhere a main loop is running on the default main context, which could be achieved with the following e.g. inside main()

let ctx = glib::MainContext::default();
let l = glib::MainLoop::new(Some(&ctx), false);
ctx.push_thread_default();

// All operations here would be scheduled on this main context
do_things(&l);

// Run everything until someone calls l.quit()
l.run();
ctx.pop_thread_default();

It is also possible to explicitly select for various operations on which main context they should run, but that’s just a minor detail.

GLib Futures

To make this situation a bit nicer, I’ve implemented support for futures in the Rust bindings. This means, that the GLib MainContext is now a futures executor (and arbitrary futures can be scheduled on it), all the GSource related operations in GLib (timeouts, UNIX signals, …) have futures- or stream-based variants and all the GIO asynchronous operations also come with futures variants now. The latter are autogenerated with the gir bindings code generator.

For enabling usage of this, the futures feature of the glib and gio crates have to be enabled, but that’s about it. It is currently still hidden behind a feature gate because the futures infrastructure is still going to go through some API incompatible changes in the near future.

So let’s take a look at how to use it. First of all, setting up the main context and executing a trivial future on it

let c = glib::MainContext::default();
let l = glib::MainLoop::new(Some(&c), false);

c.push_thread_default();

// Spawn a future that is called from the main context
// and after printing something just quits the main loop
let l_clone = l.clone();
c.spawn(futures::lazy(move |_| {
    println!("we're called from the main context");
    l_clone.quit();
    Ok(())
});

l.run();

c.pop_thread_default();

Apart from spawn(), there is also a spawn_local(). The former can be called from any thread but requires the future to implement the Send trait (that is, it must be safe to send it to other threads) while the latter can only be called from the thread that owns the main context but it allows any kind of future to be spawned. In addition there is also a block_on() function on the main context, which allows to run non-static futures up to their completion and returns their result. The spawn functions only work with static futures (i.e. they have no references to any stack frame) and requires the futures to be infallible and resolve to ().

The above code already showed one of the advantages of using futures: it is possible to use all generic futures (that don’t require a specific executor), like futures::lazy or the mpsc/oneshot channels with GLib now. And any of the combinators that are available on futures

let c = MainContext::new();
                                                                                                       
let res = c.block_on(timeout_future(20)
    .and_then(move |_| {
        // Called after 20ms
        Ok(1)
    })
);

assert_eq!(res, Ok(1));

This example also shows the block_on functionality to return an actual value from the future (1 in this case).

GIO Asynchronous Operations

Similarly, all asynchronous GIO operations are now available as futures. For example to open a file asynchronously and getting a gio::InputStream to read from, the following could be done

let file = gio::File::new_for_path("Cargo.toml");

let l_clone = l.clone();
c.spawn_local(
    // Try to open the file
    file.read_async_future(glib::PRIORITY_DEFAULT)
        .map_err(|(_file, err)| {
            format!("Failed to open file: {}", err)
        })
        .and_then(move |(_file, strm)| {
            // Here we could now read from the stream, but
            // instead we just quit the main loop
            l_clone.quit();

            Ok(())
        })
);

A bigger example can be found in the gtk-rs examples repository here. This example is basically reading a file asynchronously in 64 byte chunks and printing it to stdout, then closing the file.

In the same way, network operations or any other asynchronous operation can be handled via futures now.

Async/Await

Compared to a callback-based approach, that bigger example is already a lot nicer but still quite heavy to read. With the async/await extension that I mentioned above already, the code looks much nicer in comparison and really almost like synchronous code. Except that it is not synchronous.

#[async]
fn read_file(file: gio::File) -> Result<(), String> {
    // Try to open the file
    let (_file, strm) = await!(file.read_async_future(glib::PRIORITY_DEFAULT))
        .map_err(|(_file, err)| format!("Failed to open file: {}", err))?;

    Ok(())
}

fn main() {
    [...]
    let future = async_block! {
        match await!(read_file(file)) {
            Ok(()) => (),
            Err(err) => eprintln!("Got error: {}", err),
        }
        l_clone.quit();
        Ok(())
    };

    c.spawn_local(future);
    [...]
}

For compiling this code, the futures-nightly feature has to be enabled for the glib crate, and a nightly compiler must be used.

The bigger example from before with async/await can be found here.

With this we’re already very close in Rust to having the same convenience as in other languages with asynchronous programming. And also it is very similar to what is possible in Vala with GIO asynchronous operations.

The Future

For now this is all finished and available from GIT of the glib and gio crates. This will have to be updated in the future whenever the futures API is changing, but it is planned to stabilize all this in Rust until the end of this year.

In the future it might also make sense to add futures variants for all the GObject signal handlers, so that e.g. handling a click on a GTK+ button could be done similarly from a future (or rather from a Stream as a signal can be emitted multiple times). If this is in the end more convenient than the callback-based approach that is currently used, is to be seen. Some experimentation would be necessary here. Also how to handle return values of signal handlers would have to be figured out.

Improving GStreamer performance on a high number of network streams by sharing threads between elements with Rust’s tokio crate

For one of our customers at Centricular we were working on a quite interesting project. Their use-case was basically to receive an as-high-as-possible number of audio RTP streams over UDP, transcode them, and then send them out via UDP again. Due to how GStreamer usually works, they were running into some performance issues.

This blog post will describe the first set of improvements that were implemented for this use-case, together with a minimal benchmark and the results. My colleague Mathieu will follow up with one or two other blog posts with the other improvements and a more full-featured benchmark.

The short version is that CPU usage decreased by about 65-75%, i.e. allowing 3-4x more streams with the same CPU usage. Also parallelization works better and usage of different CPU cores is more controllable, allowing for better scalability. And a fixed, but configurable number of threads is used, which is independent of the number of streams.

The code for this blog post can be found here.

Table of Contents

  1. GStreamer & Threads
  2. Thread-Sharing GStreamer Elements
  3. Available Elements
  4. Little Benchmark
  5. Conclusion

GStreamer & Threads

In GStreamer, by default each source is running from its own OS thread. Additionally, for receiving/sending RTP, there will be another thread in the RTP jitterbuffer, yet another thread for receiving RTCP (another source) and a last thread for sending RTCP at the right times. And RTCP has to be received and sent for the receiver and sender side part of the pipeline, so the number of threads doubles. In the sum this gives at least 1 + 1 + (1 + 1) * 2 = 6 threads per RTP stream in this scenario. In a normal audio scenario, there will be one packet received/sent e.g. every 20ms on each stream, and every now and then an RTCP packet. So most of the time all these threads are only waiting.

Apart from the obvious waste of OS resources (1000 streams would be 6000 threads), this also brings down performance as all the time threads are being woken up. This means that context switches have to happen basically all the time.

To solve this we implemented a mechanism to share threads, and in the end as a result we have a fixed, but configurable number of threads that is independent from the number of streams. And can run e.g. 500 streams just fine on a single thread with a single core, which was completely impossible before. In addition we also did some work to reduce the number of allocations for each packet, so that after startup no additional allocations happen per packet anymore for buffers. See Mathieu’s upcoming blog post for details.

In this blog post, I’m going to write about a generic mechanism for sources, queues and similar elements to share their threads between each other. For the RTP related bits (RTP jitterbuffer and RTCP timer) this was not used due to reuse of existing C codebases.

Thread-Sharing GStreamer Elements

The code in question can be found here, a small benchmark is in the examples directory and it is going to be used for the results later. A full-featured benchmark will come in Mathieu’s blog post.

This is a new GStreamer plugin, written in Rust and around the Tokio crate for asynchronous IO and generally a “task scheduler”.

While this could certainly also have been written in C around something like libuv, doing this kind of work in Rust is simply more productive and fun due to its safety guarantees and the strong type system, which definitely reduced the amount of debugging a lot. And in addition “modern” language features like closures, which make working with futures much more ergonomic.

When using these elements it is important to have full control over the pipeline and its elements, and the dataflow inside the pipeline has to be carefully considered to properly configure how to share threads. For example the following two restrictions should be kept in mind all the time:

  1. Downstream of such an element, the streaming thread must never ever block for considerable amounts of time. Otherwise all other elements inside the same thread-group would be blocked too, even if they could do any work now
  2. This generally all works better in live pipelines, where media is produced in real-time and not as fast as possible

Available Elements

So this repository currently contains the generic infrastructure (see the src/iocontext.rs source file) and a couple of elements:

  • an UDP source: ts-udpsrc, a replacement for udpsrc
  • an app source: ts-appsrc, a replacement for appsrc to inject packets into the pipeline from the application
  • a queue: ts-queue, a replacement for queue that is useful for adding buffering to a pipeline part. The upstream side of the queue will block if not called from another thread-sharing element, but if called from another thread-sharing element it will pause the current task asynchronously. That is, stop the upstream task from producing more data.
  • a proxysink/src element: ts-proxysrc, ts-proxysink, replacements for proxysink/proxysrc for connecting two pipelines with each other. This basically works like the queue, but split into two elements.
  • a tone generator source around spandsp: ts-tonesrc, a replacement for tonegeneratesrc. This also contains some minimal FFI bindings for that part of the spandsp C library.

All these elements have more or less the same API as their non-thread-sharing counterparts.

API-wise, each of these elements has a set of properties for controlling how it is sharing threads with other elements, and with which elements:

  • context: A string that defines in which group this element is. All elements with the same context are running on the same thread or group of threads,
  • context-threads: Number of threads to use in this context. -1 means exactly one thread, 1 and above used N+1 threads (1 thread for polling fds, N worker threads) and 0 sets N to the number of available CPU cores. As long as no considerable work is done in these threads, -1 has shown to be the most efficient. See also this tokio GitHub issue
  • context-wait: Number of milliseconds that the threads will wait on each iteration. This allows to reduce CPU usage even further by handling all events/packets that arrived during that timespan to be handled all at once instead of waking up the thread every time a little event happens, thus reducing context switches again

The elements are all pushing data downstream from a tokio thread whenever data is available, assuming that downstream does not block. If downstream is another thread-sharing element and it would have to block (e.g. a full queue), it instead returns a new future to upstream so that upstream can asynchronously wait on that future before producing more output. By this, back-pressure is implemented between different GStreamer elements without ever blocking any of the tokio threads. All this is implemented around the normal GStreamer data-flow mechanisms, there is no “tokio fast-path” between elements.

Little Benchmark

As mentioned above, there’s a small benchmark application in the examples directory. This basically sets up a configurable number of streams and directly connects them to a fakesink, throwing away all packets. Additionally there is another thread that is sending all these packets. As such, this is really the most basic benchmark and not very realistic but nonetheless it shows the same performance improvement as the real application. Again, see Mathieu’s upcoming blog post for a more realistic and complete benchmark.

When running it, make sure that your user can create enough fds. The benchmark will just abort if not enough fds can be allocated. You can control this with ulimit -n SOME_NUMBER, and allowing a couple of thousands is generally a good idea. The benchmarks below were running with 10000.

After running cargo build –release to build the plugin itself, you can run the benchmark with:

cargo run --release --example udpsrc-benchmark -- 1000 ts-udpsrc -1 1 20

and in another shell the UDP sender with

cargo run --release --example udpsrc-benchmark-sender -- 1000

This runs 1000 streams, uses ts-udpsrc (alternative would be udpsrc), configures exactly one thread -1, 1 context, and a wait time of 20ms. See above for what these settings mean. You can check CPU usage with e.g. top. Testing was done on an Intel i7-4790K, with Rust 1.25 and GStreamer 1.14. One packet is sent every 20ms for each stream.

Source Streams Threads Contexts Wait CPU
udpsrc 1000 1000 x x 44%
ts-udpsrc 1000 -1 1 0 18%
ts-udpsrc 1000 -1 1 20 13%
ts-udpsrc 1000 -1 2 20 15%
ts-udpsrc 1000 2 1 20 16%
ts-udpsrc 1000 2 2 20 27%
Source Streams Threads Contexts Wait CPU
udpsrc 2000 2000 x x 95%
ts-udpsrc 2000 -1 1 20 29%
ts-udpsrc 2000 -1 2 20 31%
Source Streams Threads Contexts Wait CPU
ts-udpsrc 3000 -1 1 20 36%
ts-udpsrc 3000 -1 2 20 47%

Results for 3000 streams for the old udpsrc are not included as starting up that many threads needs too long.

The best configuration is apparently a single thread per context (see this tokio GitHub issue) and waiting 20ms for every iterations. Compared to the old udpsrc, CPU usage is about one third in that setting, and generally it seems to parallelize well. It’s not clear to me why the last test has 11% more CPU with two contexts, while in every other test the number of contexts does not really make a difference, and also not for that many streams in the real test-case.

The waiting does not reduce CPU usage a lot in this benchmark, but on the real test-case it does. The reason is most likely that this benchmark basically sends all packets at once, then waits for the remaining time, then sends the next packets.

Take these numbers with caution, the real test-case in Mathieu’s blog post will show the improvements in the bigger picture, where it was generally a quarter of CPU usage and almost perfect parallelization when increasing the number of contexts.

Conclusion

Generally this was a fun exercise and we’re quite happy with the results, especially the real results. It took me some time to understand how tokio works internally so that I can implement all kinds of customizations on top of it, but for normal usage of tokio that should not be required and the overall design makes a lot of sense to me, as well as the way how futures are implemented in Rust. It requires some learning and understanding how exactly the API can be used and behaves, but once that point is reached it seems like a very productive and performant solution for asynchronous IO. And modelling asynchronous IO problems based on the Rust-style futures seems a nice and intuitive fit.

The performance measurements also showed that GStreamer’s default usage of threads is not always optimal, and a model like in upipe or pipewire (or rather SPA) can provide better performance. But as this also shows, it is possible to implement something like this on top of GStreamer and for the common case, using threads like in GStreamer reduces the cognitive load on the developer a lot.

For a future version of GStreamer, I don’t think we should make the threading “manual” like in these two other projects, but instead provide some API additions that make it nicer to implement thread-sharing elements and to add ways in the GStreamer core to make streaming threads non-blocking. All this can be implemented already, but it could be nicer.

All this “only” improved the number of threads, and thus the threading and context switching overhead. Many other optimizations in other areas are still possible on top of this, for example optimizing receive performance and reducing the number of memory copies inside the pipeline even further. If that’s something you would be interested in, feel free to get in touch.

And with that: Read Mathieu’s upcoming blog posts about the other parts, RTP jitterbuffer / RTCP timer thread sharing, and no allocations, and the full benchmark.

GStreamer Rust bindings 0.11 / plugin writing infrastructure 0.2 release

Following the GStreamer 1.14 release and the new round of gtk-rs releases, there are also new releases for the GStreamer Rust bindings (0.11) and the plugin writing infrastructure (0.2).

Thanks also to all the contributors for making these releases happen and adding lots of valuable changes and API additions.

GStreamer Rust Bindings

The main changes in the Rust bindings were the update to GStreamer 1.14 (which brings in quite some new API, like GstPromise), a couple of API additions (GstBufferPool specifically) and the addition of the GstRtspServer and GstPbutils crates. The former allows writing a full RTSP server in a couple of lines of code (with lots of potential for customizations), the latter provides access to the GstDiscoverer helper object that allows inspecting files and streams for their container format, codecs, tags and all kinds of other metadata.

The GstPbutils crate will also get other features added in the near future, like encoding profile bindings to allow using the encodebin GStreamer element (a helper element for automatically selecting/configuring encoders and muxers) from Rust.

But the biggest changes in my opinion is some refactoring that was done to the Event, Message and Query APIs. Previously you would have to use a view on a newly created query to be able to use the type-specific functions on it

let mut q = gst::Query::new_position(gst::Format::Time);
if pipeline.query(q.get_mut().unwrap()) {
    match q.view() {
        QueryView::Position(ref p) => Some(p.get_result()),
        _ => None,
    }
} else {
    None
}

Now you can directly use the type-specific functions on a newly created query

let mut q = gst::Query::new_position(gst::Format::Time);
if pipeline.query(&mut q) {
    Some(q.get_result())
} else {
    None
}

In addition, the views can now dereference directly to the event/message/query itself and provide access to their API, which simplifies some code even more.

Plugin Writing Infrastructure

While the plugin writing infrastructure did not see that many changes apart from a couple of bugfixes and updating to the new versions of everything else, this does not mean that development on it stalled. Quite the opposite. The existing code works very well already and there was just no need for adding anything new for the projects I and others did on top of it, most of the required API additions were in the GStreamer bindings.

So the status here is the same as last time, get started writing GStreamer plugins in Rust. It works well!