Building a GStreamer plugin in Rust with meson instead of cargo

Over the Easter holidays I spent some time on a little experiment: How hard is it to build a GStreamer plugin in Rust with meson instead of using the default Rust build system cargo?

meson is a community-developed, open-source, cross-platform (including Windows), multi-language build system that is supposed to be fast and easy to use. It’s nowadays used as a build system by a lot of components of the GNOME desktop, by GStreamer, systemd, Mesa, X.org, QEMU and many other pieces of Free Software. Mostly for building C or C++ code, but also for configuring and installing Python or JavaScript code, Vala, and other languages.

Wouldn’t it be nice if we could also build software in Rust with it, build it together with existing code in other languages and have a common build system for it all? What would be the advantages and disadvantages of that, and what’s the current status of Rust support in meson? How much effort is it to make use of all the existing 100k software packages (“crates”) that are available for Rust?

Especially as most the projects mentioned before are looking into adopting Rust more or less seriously as a safer and more modern successor of C, these seem like useful questions to investigate. Anecdotally, I also heard that a maintainer of one of these projects said that being able to use the same build system as the rest of the codebase would be a requirement to even consider the language. Another project is starting to write some components in Rust and building them with meson, but without depending on any external Rust dependencies for now.

Another reason for looking into this was that there seems to be the opinion that you can’t really use any build system apart from cargo for building Rust code, and using meson would be very hard to impossible and involve a lot of disadvantages. This has lead to currently all GNOME applications written in Rust having a chimera of a build system using both meson and cargo because neither of the two does everything these applications need. Such a setup is hard to maintain, debug and probably almost nobody really understands it. cargo’s design does not make embedding into another build system easy, and both cargo and meson have very specific, and to some degree incompatible, opinions about how things have to be done. Let’s see if that’s actually necessary and what’s missing to move away from that. As Facebook is apparently using buck to build part of their Rust code, and Google bazel, this shouldn’t really be impossible.

As I’m a GStreamer developer, the maintainer of its Rust bindings and the one who started writing plugins for it in Rust, trying to do build a GStreamer plugin in Rust with meson instead of cargo seemed like the obvious choice for this experiment.

However, everything here applies in the same way to building GTK applications with its Rust bindings or similarly to any of the software mentioned before for writing components of them in Rust.

EDIT: After publishing the first version I was told that meson actually supports a couple of things that I missed before. For example, running Rust tests is already supported just fine (but more verbose in the build definition than cargo), and meson can already install executables setuid root by itself. Also there is an equivalent to cargo init, meson init, which makes starting new projects a bit more convenient. Also I wrote that cargo clippy is not really supported yet. While true, you can get around that by telling meson (or any other build system) to use clippy-driver as compiler instead of using rustc. I’ve updated the text below accordingly.

Summary

The code for my experiment can be found here, but at the point of writing it needs some changes to meson that are not merged yet. A list of those changes with some explanation can be found further below. The git repository also includes a cargo build system that gives the same build results for comparison purposes.

I should also make clear that this is only an experiment at this point and while it works fine, it is more manual work than necessary and if you depend on a lot of existing crates from crates.io then you probably want to wait a bit more before considering meson. Also more on that later. However, if you won’t have to depend on a lot of crates, your codebase is relatively self-contained and maybe even has to be built together with C code, then meson is already a viable alternative and has some advantages to offer compared to cargo. But also some disadvantages.

Almost all of the manual work I did as part of this experiment can be automated, and a big part of that is not going to be a lot of work either. I didn’t do that here to get an idea of the problems that would actually be encountered in practice when implementing such an automated system. I’ll get to that in more detail at the very end.

In summary I would say that

  • meson is less convenient and less declarative than cargo, but in exchange more powerful and flexible
  • meson’s Rust support is not very mature yet and there’s very little tooling integration
  • cargo is great and easy to use if your project falls into the exact pattern it handles but easily becomes annoying for you and your users otherwise
  • the developer experience is much better with cargo currently but the build and deployment experience is better with meson

More on each of these items below.

Procedure

As a goal I wanted to build one of the parts of the gst-plugins-rs tutorial, specifically an identity-kind of element that simply passes through its input, and build that into a GStreamer plugin that can be loaded into any GStreamer process. For that it has to be built into cdylib in Rust terms: a shared library that offers a C ABI. For this purpose, meson already has a big advantage over cargo in that it can actually install the build result in its correct place while cargo can only install executables right now. But more on that part later too.

The main task for this was translating all the Rust crate dependencies from cargo to meson, manually one by one. Overall 44 dependencies were needed, and I translated them into so-called meson wrap dependencies. The wrap dependency system of meson, similar to cargo, allows to download source code of dependencies from another location (in this case crates.io) and to extend it with a patch, in this case to add the meson-based build system for each.

In practice this meant creating a lot of meson.build files based on the Cargo.toml of each crate. The following following is the meson.build for the toml_edit crate.

project('toml_edit-rs', 'rust',
  version : '0.19.8',
  meson_version : '>= 1.0.0',
  default_options : ['buildtype=debugoptimized',
                     'rust_std=2021'
                    ]
)

rustc = meson.get_compiler('rust')

toml_datetime_dep = dependency('toml_datetime-rs', version : ['>= 0.6', '< 0.7'])
winnow_dep = dependency('winnow-rs', version : ['>= 0.4', '< 0.5'])
indexmap_dep = dependency('indexmap-rs', version : ['>= 1.9.1', '< 2.0'])

features = []
features += ['--cfg', 'feature="default"']

toml_edit = static_library('toml_edit', 'src/lib.rs',
  rust_args : features,
  rust_crate_type : 'rlib',
  dependencies : [toml_datetime_dep, winnow_dep, indexmap_dep],
  pic : true,
)

toml_edit_dep = declare_dependency(link_with : toml_edit)

and the corresponding wrap file

[wrap-file]
directory = toml_edit-0.19.8
source_url = https://crates.io/api/v1/crates/toml_edit/0.19.8/download
source_filename = toml_edit-0.19.8.tar.gz
source_hash = 239410c8609e8125456927e6707163a3b1fdb40561e4b803bc041f466ccfdc13
diff_files = toml_edit-rs.meson.patch

[provide]
toml_edit-rs = toml_edit_dep

As can be seen from the above, this could all be autogenerated from the corresponding Cargo.toml and that’s the case for a lot of crates. There are also some plans and ideas since quite a while in the meson community to actually develop such a tool, but so far this hasn’t materialized yet. Maybe my experiment can provide some motivation to actually start with that work.

For simplicity, when translating these by hand I didn’t consider including

  • any optional dependencies unless needed for my tasks
  • any tests, examples, executables as part of the crate
  • any cargo feature configuration instead of exactly what I needed

All of this can be easily done with meson though, and an automated tool for translating Cargo.toml into meson wraps should easily be able to handle that. For cargo features there are multiple ways of mapping them to meson, so some conventions have to be defined first of all. Similarly for naming Rust crates as meson dependencies it will be necessary to define some conventions to allow sharing between projects.

The hard part for translating some of the crates were the cargo build.rs build scripts. These build scripts allow running arbitrary Rust code as part of the build, which will also make automatic translation challenging. More on that later.

Once I had translated all the 44 dependencies, including the GStreamer Rust bindings, various procedural Rust macros and a lot of basic crates of the Rust ecosystem, I copied the plugin code into the new repository, changed some minor things and could then write a meson.build file for it.

project('gst-plugin-meson-test', 'rust',
  version : '0.0.1',
  meson_version : '>= 1.0.0',
  default_options : ['buildtype=debugoptimized',
                     'rust_std=2021'
                    ]
)

plugins_install_dir = '@0@/gstreamer-1.0'.format(get_option('libdir'))

rustc = meson.get_compiler('rust')

add_global_arguments(
  '-C', 'embed-bitcode=no',
  language: 'rust'
)

gst_dep = dependency('gstreamer-rs', version : ['>= 0.20', '< 0.21'])
once_cell_dep = dependency('once_cell-rs', version : ['>= 1.0', '< 2.0'])

gst_plugin_meson_test = dynamic_library('gstmesontest', 'src/lib.rs',
  rust_crate_type : 'cdylib',
  rust_dependency_map : {
    'gstreamer' : 'gst',
  },
  dependencies : [gst_dep, once_cell_dep],
  install : true,
  install_dir : plugins_install_dir,
)

To get everything building like this with the latest meson version (1.1.0), at this point some additional changes are needed: 1, 2, 3 and 4. Also currently all of this only supports native compilation. Cross-compilation of proc-macro in meson is currently broken but should be not too hard to fix in the future.

A careful reader will also notice that currently all crates with a dash (-) in their name are named like that for the dependency name, but the library they’re building has the dash replaced by an underscore in its name. This is due to a rustc bug (or undocumented behaviour?), and meson will likely require this translation of forbidden characters in crate names for the foreseeable future.

With all that in place, building the project is a matter of running

$ meson builddir
$ ninja -C builddir

compared to a single command with cargo

$ cargo build

Note that the meson build will be slower because by default meson already builds with optimizations (-C opt-level=2) while cargo does not.

Comparison between cargo and meson

In the following section I’m going to compare some aspects of cargo and meson, and give my general impression of both. To take the conclusion ahead, my perfect build system would include aspects of both cargo and meson and currently both are lacking in their own way. Like with any other tool (or programming language, for that matter): if you don’t have anything to complain about it you don’t know it well enough yet or don’t know any of the alternatives.

To avoid that people focus on the negative aspects, I’m also going to try to describe all the shortcomings of one build system as a good feature of the other. This is no competition but both meson and cargo can learn a lot from each other.

Build Times

Above I shortly mentioned build times. And because that’s something everybody is interested in and a common complaint about Rust, let’s start with that topic even if it’s in my opinion the most boring one.

Overall you would expect both build systems to behave the same if they do the same work as they are both basically just sophisticated Rust compiler process spawners. If you want to improve build times then your time is probably spent better on the Rust compiler itself and LLVM.

All times below are measured on my system very unscientifically with time. This is all only to give an idea of the general behaviour and to check if there are conceptual inefficiencies or problems. Also, when reproducing these results make sure to pass -Dbuildtype=debug to meson for comparable results between meson and cargo.

  • Running meson (build configuration): 1.4s (1.0s user, 0.4s system)
  • Running ninja (actual build): 10.4s (23.5s user, 3.2s system)
  • Running cargo: 11.0s (34.1s user, 6.6s system)

One thing that shows here immediately is that both need approximately the same time. The build alone takes slightly less than cargo, the configure and build steps together slightly more. So far what would be expected. However, cargo uses almost 45% more CPU time. I didn’t investigate this in great detail but the main two reasons for that are likely

  • cargo is building and running a build.rs build script in 23 of the 44 dependencies, which takes a while and also pulls in some more dependencies that are not needed for the meson build
  • meson is currently parallelizing less well compared to cargo when building Rust code, or otherwise the build step would likely be 10% or more faster

cargo build scripts

Interestingly, the main pain point when translating the crate build systems from cargo to meson also seems to be the main factor for cargo being more inefficient than it could be. This also seems to be a known fact in the wider Rust community by now.

But in addition to being inefficient and painful to translate (especially automatically), it is in my opinion also a maintenance nightmare and literally the worst part of cargo. It’s not declarative in what the build script is actually trying to do, it’s very easy to write build scripts that don’t work correctly in other environments and two crates doing the same things in a build script are generally going to behave differently in non-obvious ways.

For all the crates that I translated, the reasons why the build scripts existed in 23 of 44 crates should really be features that cargo should provide directly and that meson can do directly:

  • Checking which version of the Rust compiler is used and based on that enabling/disabling various features
  • Checking features or versions of the underlying platform or operating system
  • Checking for existence of native, external libraries or even building them
  • Not in these cases: code generation

Especially the last two points are painful for build and deployment of Rust code, and that every crate has its own special way of solving it in a build script doesn’t make it better. And in the end both tasks are exactly what you have a build system for: building things, tracking dependencies between them and generating an ideal build plan or schedule.

The system-deps crate is providing a way for expressing external dependencies in a declarative way as part of Cargo.toml, and a similar system built into cargo and integrated with the platform’s mechanism for finding libraries would be a huge improvement. Similar approaches for the other aspects would also be helpful. And not just for making a different build system for crates, but also for developers using cargo.

I’m sure that once cargo gained features for handling the four items above there will still be a need for custom build scripts in some situations, but these four items should cover more than 95% of the currently existing build scripts. It’s a mystery to me why there are no apparent efforts being made to improve cargo in this regard.

Good Parts of meson

Now let’s focus on some of the good parts of meson in comparison to cargo.

More flexible and expressive

Generally, meson has a much more flexible and expressive build definition language. It looks more like a scripting language than the toml files used by cargo, and as such appears more complex.

However thanks to this approach, almost anything for which cargo requires custom build scripts, with the disadvantages listed above, can be written directly as part of the meson build definitions. Expressing many of these things in toml for cargo would likely become convoluted and not very straightforward, as can already be seen nowadays with e.g. platform-specific dependencies.

meson provides built-in modules with support for e.g.

  • finding and checking versions of compilers of different languages and other build tools, e.g. for code generation, including the Rust bindgen tool
  • testing compilation of code snippets
  • finding external library dependencies and checking their versions
  • defining shared/static library, excutable and code generation build targets together with how (and if) they should be installed later
  • installing (and generating) data files, including specific support for e.g. library metadata (pkg-config), documentation, gobject-introspection, …

As an escape hatch, whenever something can’t be expressed by just meson, it is also possible to run external configuration/build/install scripts that could be written as a Python/shell script, C or Rust executable, or anything else really. This is rarely needed though and the meson project seems to add support for anything that people actually need in practice.

Build configuration system

meson provides an extensible configuration mechanism for the build, which allow the user building the software to customize the build process and its results.

  1. The built-in options allow for configuring things like switching between debug and release builds, defining toolchains and defining where to install build results.

  2. The build options allow each project to extend the above with custom configuration, e.g. for enabling/disabling optional features of the project or generally selecting between different configurations. Apart from simple boolean flags this also allows for other types of configuration, including integers and strings.

cargo only provides one of the two in the form of fixed, non-extensible compiler profiles and configuration.

The second is not to be confused with cargo’s feature flags, which are more a mechanism for selecting configuration of dependencies by the developer of the software. The meson build configuration however is for the builder of the software to select different configurations. While for executables cargo feature flags are used in a similar way for boolean configurations, cargo does not provide anything for this in general.

As a workaround for this, a couple of Rust crates are using environment variables together with the env! macro for build configuration but this is fragile and not discoverable, and more of a hack than a real solution.

Support for Rust and non-Rust code in the same project

meson supports mixing Rust and non-Rust code in the same project, and allows tracking dependencies between targets using different languages in the same way.

While it is not supported to mix Rust and e.g. C code in the same build target due to Rust’s compilation model, it is possible to e.g. build a static Rust library, a static C library and link both together into e.g. a D application or Python module. An example for this would be this Python module that combines C, C++, Rust and Fortran code.

Code generation can be handled in a similar way as in the end code generation is just another transformation from one format into another.

cargo doesn’t directly support anything but Rust code. As usual, build scripts provide a mechanism to get around this limitation. The cc crate for example is widely used to build C code and there are also crates for building meson or cmake based software as part of the build process.

All of this is completely opaque to cargo though and can’t be taken into account for defining an optimal build schedule, can’t be configured from the outside and regularly fails in non-standard build situations (e.g. cross-compilation).

Installation of other files than executables

meson allows every build result to be installed in a configurable location. This is especially useful for more complex applications that might have to provide various data files or come as an executable plus multiple libraries or plugins, or simply for projects that only provide a shared library. If any of the built-in installation mechanisms are not sufficient (e.g. the executable should be get specific process capabilities set via setcap), meson also allows to customize the install process via scripts.

cargo only allows to install executables right now. There are cargo extensions that also allow for more complex tasks, e.g. cargo xtask but there is no standard mechanism. There once was an RFC to make cargo’s installation process extensible, but the way how this was proposed would also suffer from the same problems as the cargo build scripts.

External dependency and library support

In addition to mixing build targets with multiple languages in the same project, meson also has a mechanism to find external dependencies in different ways. If an external dependency is not found, it can be provided and built as part of the project via the wrap mechanism mentioned before. The latter is similar to how cargo handles dependencies, but the former is missing completely and currently implemented via build scripts instead, e.g. by using the pkg-config crate.

As Rust does not currently provide a stable ABI and provides no standard mechanism to locate library crates on the system, this mostly applies to library dependencies written in other languages. meson does support building Rust shared/static libraries and installing them too, but because of the lack of a stable ABI this has to be made use of very carefully.

On the other hand, Rust allows building shared/static libraries that provide the stable C ABI of the platform (cdylib, staticlib crate types). meson allows building these correctly too, and also offers mechanisms for installing them together with their (potentially autogenerated header files) and locating them again later in other projects via e.g. pkg-config.

For cargo this job can be taken care of by cargo-c, including actually building shared libraries correctly by setting e.g. the soname correctly and setting other kinds of versioning information.

Good Parts of cargo

After writing so much about meson and how great it is, let’s now look at some aspects of cargo that are better than what meson provides. Like I said before, both have their good sides.

Simpler and more declarative build definitions

The cargo manifest format is clearly a lot simpler and more declarative than the meson build definition format.

For a simple project it looks more like a project description than something written in a scripting language.

[package]
name = "hello_world"
version = "0.1.0"
edition = "2021"

[dependencies]
anyhow = "1.0"

As long as a project stays in the boundaries of what cargo makes easy to express, which should be the case for the majority of existing Rust projects, it is going to be simpler than meson. The lack of various features in cargo that require the use of a build script prevent this currently for many crates, but that seems like something that could be easily improved.

meson on the other hand feels more like writing actual build scripts in some kind of scripting language, and information like “what dependencies does this have” are not as easily visible as from something like a Cargo.toml.

Tooling integration

cargo provides a lot development tools that make development with it a very convenient and smooth experience. There are also dozens of cargo extension commands that provide additional features on top of cargo.

cargo init creates a new project, cargo add adds new dependencies, cargo check type-checks code without full compilation, cargo clippy runs a powerful linter, cargo doc builds the documentation (incl. dependencies), cargo bench and cargo test allow running tests and benchmarks, cargo show-asm shows the generated, annotated assembly for the code, cargo udeps finds unused dependencies, …

All of this makes development of Rust project a smooth and well-integrated experience.

In addition rust-analyzer provides a lot of IDE features via the LSP protocol for various editors and IDEs.

Right now, IDEs and editors are assuming Rust projects to make use of cargo and offer integration with its features.

On the other hand, when using meson almost none of this is currently provided and development feels less well integrated. Right now the only features provided by meson for making Rust development easier is generation of a rust-project.json to make use of rust-analyzer and being able to run tests in a similar way to cargo, and of course actually building the code. Building of documentation could be easily added to meson and is supported for other languages, something like cargo add for wrap dependencies exists already and adding crates.io support to it would be possible, but it’s going to take a while and a bit of effort to handle crates with cargo build scripts. Making use of all the cargo extension commands without actually using cargo seems unrealistic.

In the end, cargo is the default build system for Rust and everything currently assumes usage of cargo so using cargo offers the best developer experience.

Rust dependencies

As shortly mentioned above, cargo add makes it extremely easy to add new Rust dependencies to a project and build them as part of the project. This helps a lot with code reuse and modularity. That an average Rust project has dozens of direct dependencies and maybe a hundred or more indirect dependencies shows this quite clearly, as does the encapsulation of very small tasks in separate dependencies compared to huge multi-purpose libraries that are common with other languages.

cargo also directly handles updating of dependencies via cargo update, including making sure that only semver compatible versions are automatically updated and allows including multiple incompatible versions of dependencies in the same build if necessary.

In addition to just adding dependencies, cargo features allow for conditional compilation and for defining which parts and features of a dependency should be enabled or not.

meson has some explicit handling of dependencies and the wrap system also allows building external dependencies as part of the project, but adding or updating dependencies is more manual process unless they are in the meson wrapdb. For now it is completely manual with regards to Rust crates. It is no different from adding dependencies in non-Rust languages though.

There is also no direct equivalent of the cargo feature flags, which potentially seems like a useful addition to meson.

Next steps

Considering all of the above, there is not really a simple answer to which of the two choices is the best for your project. As of now I would personally always use cargo for Rust projects if I can, especially if they will have other Rust dependencies. It’s a lot more convenient to develop with cargo.

However various features of meson might make it a better choice for some projects, maybe already now or at least in the future. For example the ability to handle multiple languages, to handle dependencies and shared libraries correctly or the ability to install data files together with the application. Or simply because the remainder of the project already uses meson for compiling e.g. C code, meson might be the more natural choice for adding Rust code to the mix.

As outlined above there are many areas of meson that could be improved and where Rust support is not very mature yet, and to make meson a more viable alternative to cargo these will have to happen sooner or later. Similarly, there are various areas where cargo also could be improved and learn from meson, and where improvements to cargo, in addition to making cargo more flexible and easier to use, would also make it easier to handle other build systems.

From a meson point of view, I would consider the following the next steps.

Various bugs and missing features

rustc and cargo

On the Rust compiler side, there are currently two minor issues that would need looking into.

  • rustc#80792: Adding support for passing environment variables via the command-line instead of using actual environment variables. Environment variables are currently used for build configuration by many crates. Allowing to provide them via the command-line would allow for more clean and reliable build rules without accidentally leaking actual environment variables into the build.
  • rustc#110460: Undocumented and maybe unintentional library filename requirement based on the crate name. This currently requires meson to disallow dashes in crate names and to have them explicitly replaced by underscore, which is something cargo does implicitly. Implicit conversion inside meson would also be feasible but probably not desireable because there would then be a mismatch between the name of the build target in the build definition and the actual name of the build target when building it. Similar to the mild confusion that some people ran into when noticing that a crate with a dash in the name can only be referenced with underscores from Rust code.

In addition it would be useful to look into moving various features from build scripts into proper cargo, as outlined above. This will probably need a lot of design and discussion effort, and will likely also take years after implementation until the important crates are moved to it due to very conservative Rust toolchain version requirement policies in various of these crates.

So on the cargo side that’s more something for the long run but also something that would greatly benefit users of cargo.

meson

On the meson side there are a couple of bugs of different severity that will have to be solved, and probably more that will show up once more people are starting to use meson to build Rust projects.

In addition to the ones I mentioned above and that are merged already into the meson git repository, at the time of this writing there were for example the following outstanding issues

  • meson#11681: Add a feature to meson to allow renaming crates when using as a dependency. This is used throughout the Rust ecosystem for handling multiple versions of the same crate at once, or also simply for having a more convenient local name of a dependency.
  • meson#11695: Parallelize the Rust build better by already starting to build the next Rust targets when the metadata of the dependencies is available. This should bring build time improvements of 10% or more on machines with enough cores, and the lack of it is the main reason why the meson build in my experiment was “only” as fast as the cargo build and not faster.
  • meson#11702: Cross-compilation of proc-macro crates is currently using the wrong toolchain and simply doesn’t work.
  • meson#11694: Indirect dependencies of all dependencies are passed onwards on the later compiler invocations, which brings the risk of unnecessary name conflicts and simply causes more work for the compiler than necessary.
  • meson#10030: Add support for passing environment variables to the Rust compiler. As mentioned above, many crates are currently using this for build configuration so this would have to be supported by meson in one way or another.

Apart from the second one in this list these should all be doable relatively fast and generally getting fixes and improvements required for Rust merged into meson was a fast and pleasant experience so far. I didn’t encounter any unnecessary bikeshedding or stop energy.

Tooling for managing cargo dependencies

During my experiment I wrote all the meson wrap files manually. This does not really scale, is inconvenient, and also makes it harder to update dependencies later.

The goal here would be to provide a tool in the shape of cargo add and cargo update that allows to automatically add cargo-based Rust dependencies to a meson project. This is something that was discussed a lot in the past and various people in the meson community have ideas and plans around this. meson already has something similar for the wrapdb, meson wrap add and meson wrap update, but the idea would be to have something similar (or integration into that) to directly support crates from crates.io so Rust dependencies can be added with as little effort to a meson project as they can currently to a cargo project.

Apart from the cargo build scripts this shouldn’t be a lot of effort and a project for an afternoon at most, so maybe I’ll give that a try one of these days if nobody beats me to it.

As part of such a tool, it will also be necessary to define conventions about naming, mapping of cargo features, versioning, etc. of Rust crates inside meson and this should ideally be done from early on to avoid unnecessary churn. The way how I did it as part of my experiment has various drawbacks with regards to versioning and needs improvements.

Handling cargo build scripts is a bigger issue though. As my experiment showed, about half of the crates had build scripts. While all of them were more or less trivial, automatically translating this into meson build definitions seems unrealistic to me.

It might be possible to have meson use the cargo build scripts directly in one way or another, or they would have to be translated manually to meson build definitions. The latter would considerably improve build times so seems like a better approach for common crates at least. And for those the meson build definitions could be stored in a central place, like the meson wrapdb or maybe even be included in the crates on crates.io if their maintainers feel like dealing with two build systems.

Together with all this, some thought will also have to be put into how to locate such Rust dependencies, similar to how pkg-config allows to locate shared libraries. For example, Linux distributions will want to package such dependencies and make sure that a project built for such a distribution is making use of the packaged dependencies instead of using any other version, or worse downloading some version from the Internet at build time. The way how this is currently handled by cargo is also not optimal for Linux distributions and a couple of other build and deployment scenarios.

Because of the lack of a stable Rust ABI this would mean locating Rust source code.

Tooling integration

And last, as mentioned above there is basically no tooling integration right now apart from being able to build Rust code and using rust-analyzer. meson should at least support the most basic tasks that cargo supports, and that meson already supports for other languages: running tests and benchmarks, running linters and building documentation.

Once those basic tasks are done, it might be worth investigating other tooling integration like the various cargo extension commands offer or extending those commands to handle other build systems, e.g. via the rust-project.json that rust-analyzer uses for that purpose.

Instantaneous RTP synchronization & retrieval of absolute sender clock times with GStreamer

Over the last few weeks, GStreamer’s RTP stack got a couple of new and quite useful features. As it is difficult to configure, mostly because there being so many different possible configurations, I decided to write about this a bit with some example code.

The features are RFC 6051-style rapid synchronization of RTP streams, which can be used for inter-stream (e.g. audio/video) synchronization as well as inter-device (i.e. network) synchronization, and the ability to easily retrieve absolute sender clock times per packet on the receiver side.

Note that each of this was already possible before with GStreamer via different mechanisms with different trade-offs. Obviously, not being able to have working audio/video synchronization would be simply not acceptable and I previously talked about how to do inter-device synchronization with GStreamer before, for example at the GStreamer Conference 2015 in Düsseldorf.

The example code below will make use of the GStreamer RTSP Server library but can be applied to any kind of RTP workflow, including WebRTC, and are written in Rust but the same can also be achieved in any other language. The full code can be found in this repository.

And for reference, the merge requests to enable all this are [1], [2] and [3]. You probably don’t want to backport those to an older version of GStreamer though as there are dependencies on various other changes elsewhere. All of the following needs at least GStreamer from the git main branch as of today, or the upcoming 1.22 release.

Baseline Sender / Receiver Code

The starting point of the example code can be found here in the baseline branch. All the important steps are commented so it should be relatively self-explanatory.

Sender

The sender is starting an RTSP server on the local machine on port 8554 and provides a media with H264 video and Opus audio on the mount point /test. It can be started with

$ cargo run -p rtp-rapid-sync-example-send

After starting the server it can be accessed via GStreamer with e.g. gst-play-1.0 rtsp://127.0.0.1:8554/test or similarly via VLC or any other software that supports RTSP.

This does not do anything special yet but lays the foundation for the following steps. It creates an RTSP server instance with a custom RTSP media factory, which in turn creates custom RTSP media instances. All this is not needed at this point yet but will allow for the necessary customization later.

One important aspect here is that the base time of the media’s pipeline is set to zero

pipeline.set_base_time(gst::ClockTime::ZERO);
pipeline.set_start_time(gst::ClockTime::NONE);

This allows the timeoverlay element that is placed in the video part of the pipeline to render the clock time over the video frames. We’re going to use this later to confirm on the receiver that the clock time on the sender and the one retrieved on the receiver are the same.

let video_overlay = gst::ElementFactory::make("timeoverlay", None)
    .context("Creating timeoverlay")?;
[...]
video_overlay.set_property_from_str("time-mode", "running-time");

It actually only supports rendering the running time of each buffer, but in a live pipeline with the base time set to zero the running time and pipeline clock time are the same. See the documentation for some more details about the time concepts in GStreamer.

Overall this creates the following RTSP stream producer bin, which will be used also in all the following steps:

Receiver

The receiver is a simple playbin pipeline that plays an RTSP URI given via command-line parameters and runs until the stream is finished or an error has happened.

It can be run with the following once the sender is started

$ cargo run -p rtp-rapid-sync-example-recv -- "rtsp://192.168.1.101:8554/test"

Please don’t forget to replace the IP with the IP of the machine that is actually running the server.

All the code should be familiar to anyone who ever wrote a GStreamer application in Rust, except for one part that might need a bit more explanation

pipeline.connect_closure(
    "source-setup",
    false,
    glib::closure!(|_playbin: &gst::Pipeline, source: &gst::Element| {
        source.set_property("latency", 40u32);
    }),
);

playbin is going to create an rtspsrc, and at that point it will emit the source-setup signal so that the application can do any additional configuration of the source element. Here we’re connecting a signal handler to that signal to do exactly that.

By default rtspsrc introduces a latency of 2 seconds of latency, which is a lot more than what is usually needed. For live, non-VOD RTSP streams this value should be around the network jitter and here we’re configuring that to 40 milliseconds.

Retrieval of absolute sender clock times

Now as the first step we’re going to retrieve the absolute sender clock times for each video frame on the receiver. They will be rendered by the receiver at the bottom of each video frame and will also be printed to stdout. The changes between the previous version of the code and this version can be seen here and the final code here in the sender-clock-time-retrieval branch.

When running the sender and receiver as before, the video from the receiver should look similar to the following

The upper time that is rendered on the video frames is rendered by the sender, the bottom time is rendered by the receiver and both should always be the same unless something is broken here. Both times are the pipeline clock time when the sender created/captured the video frame.

In this configuration the absolute clock times of the sender are provided to the receiver via the NTP / RTP timestamp mapping provided by the RTCP Sender Reports. That’s also the reason why it takes about 5s for the receiver to know the sender’s clock time as RTCP packets are not scheduled very often and only after about 5s by default. The RTCP interval can be configured on rtpbin together with many other things.

Sender

On the sender-side the configuration changes are rather small and not even absolutely necessary.

rtpbin.set_property_from_str("ntp-time-source", "clock-time");

By default the RTP NTP time used in the RTCP packets is based on the local machine’s walltime clock converted to the NTP epoch. While this works fine, this is not the clock that is used for synchronizing the media and as such there will be drift between the RTP timestamps of the media and the NTP time from the RTCP packets, which will be reset every time the receiver receives a new RTCP Sender Report from the sender.

Instead, we configure rtpbin here to use the pipeline clock as the source for the NTP timestamps used in the RTCP Sender Reports. This doesn’t give us (by default at least, see later) an actual NTP timestamp but it doesn’t have the drift problem mentioned before. Without further configuration, in this pipeline the used clock is the monotonic system clock.

rtpbin.set_property("rtcp-sync-send-time", false);

rtpbin normally uses the time when a packet is sent out for the NTP / RTP timestamp mapping in the RTCP Sender Reports. This is changed with this property to instead use the time when the video frame / audio sample was captured, i.e. it does not include all the latency introduced by encoding and other processing in the sender pipeline.

This doesn’t make any big difference in this scenario but usually one would be interested in the capture clock times and not the send clock times.

Receiver

On the receiver-side there are a few more changes. First of all we have to opt-in to rtpjitterbuffer putting a reference timestamp metadata on every received packet with the sender’s absolute clock time.

pipeline.connect_closure(
    "source-setup",
    false,
    glib::closure!(|_playbin: &gst::Pipeline, source: &gst::Element| {
        source.set_property("latency", 40u32);
        source.set_property("add-reference-timestamp-meta", true);
    }),
);

rtpjitterbuffer will start putting the metadata on packets once it knows the NTP / RTP timestamp mapping, i.e. after the first RTCP Sender Report is received in this case. Between the Sender Reports it is going to interpolate the clock times. The normal timestamps (PTS) on each packet are not affected by this and are still based on whatever clock is used locally by the receiver for synchronization.

To actually make use of the reference timestamp metadata we add a timeoverlay element as video-filter on the receiver:

let timeoverlay =
    gst::ElementFactory::make("timeoverlay", None).context("Creating timeoverlay")?;

timeoverlay.set_property_from_str("time-mode", "reference-timestamp");
timeoverlay.set_property_from_str("valignment", "bottom");

pipeline.set_property("video-filter", &timeoverlay);

This will then render the sender’s absolute clock times at the bottom of each video frame, as seen in the screenshot above.

And last we also add a pad probe on the sink pad of the timeoverlay element to retrieve the reference timestamp metadata of each video frame and then printing the sender’s clock time to stdout:

let sinkpad = timeoverlay
    .static_pad("video_sink")
    .expect("Failed to get timeoverlay sinkpad");
sinkpad
    .add_probe(gst::PadProbeType::BUFFER, |_pad, info| {
        if let Some(gst::PadProbeData::Buffer(ref buffer)) = info.data {
            if let Some(meta) = buffer.meta::<gst::ReferenceTimestampMeta>() {
                println!("Have sender clock time {}", meta.timestamp());
            } else {
                println!("Have no sender clock time");
            }
        }

        gst::PadProbeReturn::Ok
    })
    .expect("Failed to add pad probe");

Rapid synchronization via RTP header extensions

The main problem with the previous code is that the sender’s clock times are only known once the first RTCP Sender Report is received by the receiver. There are many ways to configure rtpbin to make this happen faster (e.g. by reducing the RTCP interval or by switching to the AVPF RTP profile) but in any case the information would be transmitted outside the actual media data flow and it can’t be guaranteed that it is actually known on the receiver from the very first received packet onwards. This is of course not a problem in every use-case, but for the cases where it is there is a solution for this problem.

RFC 6051 defines an RTP header extension that allows to transmit the NTP timestamp that corresponds an RTP packet directly together with this very packet. And that’s what the next changes to the code are making use of.

The changes between the previous version of the code and this version can be seen here and the final code here in the rapid-synchronization branch.

Sender

To add the header extension on the sender-side it is only necessary to add an instance of the corresponding header extension implementation to the payloaders.

let hdr_ext = gst_rtp::RTPHeaderExtension::create_from_uri(
    "urn:ietf:params:rtp-hdrext:ntp-64",
    )
    .context("Creating NTP 64-bit RTP header extension")?;
hdr_ext.set_id(1);
video_pay.emit_by_name::<()>("add-extension", &[&hdr_ext]);

This first instantiates the header extension based on the uniquely defined URI for it, then sets its ID to 1 (see RFC 5285) and then adds it to the video payloader. The same is then done for the audio payloader.

By default this will add the header extension to every RTP packet that has a different RTP timestamp than the previous one. In other words: on the first packet that corresponds to an audio or video frame. Via properties on the header extension this can be configured but generally the default should be sufficient.

Receiver

On the receiver-side no changes would actually be necessary. The use of the header extension is signaled via the SDP (see RFC 5285) and it will be automatically made use of inside rtpbin as another source of NTP / RTP timestamp mappings in addition to the RTCP Sender Reports.

However, we configure one additional property on rtpbin

source.connect_closure(
    "new-manager",
    false,
    glib::closure!(|_rtspsrc: &gst::Element, rtpbin: &gst::Element| {
        rtpbin.set_property("min-ts-offset", gst::ClockTime::from_mseconds(1));
    }),
);

Inter-stream audio/video synchronization

The reason for configuring the min-ts-offset property on the rtpbin is that the NTP / RTP timestamp mapping is not only used for providing the reference timestamp metadata but it is also used for inter-stream synchronization by default. That is, for getting correct audio / video synchronization.

With RTP alone there is no mechanism to synchronize multiple streams against each other as the packet’s RTP timestamps of different streams have no correlation to each other. This is not too much of a problem as usually the packets for audio and video are received approximately at the same time but there’s still some inaccuracy in there.

One approach to fix this is to use the NTP / RTP timestamp mapping for each stream, either from the RTCP Sender Reports or from the RTP header extension, and that’s what is made use of here. And because the mapping is provided very often via the RTP header extension but the RTP timestamps are only accurate up to clock rate (1/90000s for video and 1/48000s) for audio in this case, we configure a threshold of 1ms for adjusting the inter-stream synchronization. Without this it would be adjusted almost continuously by a very small amount back and forth.

Other approaches for inter-stream synchronization are provided by RTSP itself before streaming starts (via the RTP-Info header), but due to a bug this is currently not made use of by GStreamer.

Yet another approach would be via the clock information provided by RFC 7273, about which I already wrote previously and which is also supported by GStreamer. This also allows inter-device, network synchronization and used for that purpose as part of e.g. AES67, Ravenna, SMPTE 2022 / 2110 and many other protocols.

Inter-device network synchronization

Now for the last part, we’re going to add actual inter-device synchronization to this example. The changes between the previous version of the code and this version can be seen here and the final code here in the network-sync branch. This does not use the clock information provided via RFC 7273 (which would be another option) but uses the same NTP / RTP timestamp mapping that was discussed above.

When starting the receiver multiple times on different (or the same) machines, each of them should play back the media synchronized to each other and exactly 2 seconds after the corresponding audio / video frames are produced on the sender.

For this, both sender and all receivers are using an NTP clock (pool.ntp.org in this case) instead of the local monotonic system clock for media synchronization (i.e. as the pipeline clock). Instead of an NTP clock it would also be possible to any other mechanism for network clock synchronization, e.g. PTP or the GStreamer netclock.

println!("Syncing to NTP clock");
clock
    .wait_for_sync(gst::ClockTime::from_seconds(5))
    .context("Syncing NTP clock")?;
println!("Synced to NTP clock");

This code instantiates a GStreamer NTP clock and then synchronously waits up to 5 seconds for it to synchronize. If that fails then the application simply exits with an error.

Sender

On the sender side all that is needed is to configure the RTSP media factory, and as such the pipeline used inside it, to use the NTP clock

factory.set_clock(Some(&clock));

This causes all media inside the sender’s pipeline to be synchronized according to this NTP clock and to also use it for the NTP timestamps in the RTCP Sender Reports and the RTP header extension.

Receiver

On the receiver side the same has to happen

pipeline.use_clock(Some(&clock));

In addition a couple more settings have to be configured on the receiver though. First of all we configure a static latency of 2 seconds on the receiver’s pipeline.

pipeline.set_latency(gst::ClockTime::from_seconds(2));

This is necessary as GStreamer can’t know the latency of every receiver (e.g. different decoders might be used), and also because the sender latency can’t be automatically known. Each audio / video frame will be timestamped on the receiver with the NTP timestamp when it was captured / created, but since then all the latency of the sender, the network and the receiver pipeline has passed and for this some compensation must happen.

Which value to use here depends a lot on the overall setup, but 2 seconds is a (very) safe guess in this case. The value only has to be larger than the sum of sender, network and receiver latency and in the end has the effect that the receiver is showing the media exactly that much later than the sender has produced it.

And last we also have to tell rtpbin that

  1. sender and receiver clock are synchronized to each other, i.e. in this case both are using exactly the same NTP clock, and that no translation to the pipeline’s clock is necessary, and
  2. that the outgoing timestamps on the receiver should be exactly the sender timestamps and that this conversion should happen based on the NTP / RTP timestamp mapping
source.set_property_from_str("buffer-mode", "synced");
source.set_property("ntp-sync", true);

And that’s it.

A careful reader will also have noticed that all of the above would also work without the RTP header extension, but then the receivers would only be synchronized once the first RTCP Sender Report is received. That’s what the test-netclock.c / test-netclock-client.c example from the GStreamer RTSP server is doing.

As usual with RTP, the above is by far not the only way of doing this and GStreamer also supports various other synchronization mechanisms. Which one is the correct one for a specific use-case depends on a lot of factors.

Porting EBU R128 audio loudness analysis from C to Rust

Over the last few weeks I ported the libebur128 C library to Rust, both with a proper Rust API as well as a 100% compatible C API.

This blog post will be split into 4 parts that will be published over the next weeks

  1. Overview and motivation
  2. Porting approach with various details, examples and problems I ran into along the way
  3. Performance optimizations
  4. Building Rust code into a C library as drop-in replacement

If you’re only interested in the code, that can be found on GitHub and in the ebur128 crate on crates.io.

The initial versions of the ebur128 crate was built around the libebur128 C library (and included its code for ease of building), version 0.1.2 and newer is the pure Rust implementation.

EBU R128

libebur128 implements the EBU R128 loudness standard. The Wikipedia page gives a good summary of the standard, but in short it describes how to measure loudness of an audio signal and how to use this for loudness normalization.

While this intuitively doesn’t sound very complicated, there are lots of little details (like how human ears are actually working) that make this not as easy as one might expect. This results in there being many different ways for measuring loudness and is one of the reasons why this standard was introduced. Of course it is also not the only standard for this.

libebur128 is also the library that I used in the GStreamer loudness normalization plugin, about which I wrote a few weeks ago already. By porting the underlying loudness measurement code to Rust, the only remaining C dependency of that plugin is GStreamer itself.

Apart from that it is used by FFmpeg, but they include their own modified copy, as well as many other projects that need some kind of loudness measurement and don’t use ReplayGain, another older but widely used standard for the same problem.

Why?

Before going over the details of what I did, let me first explain why I did this work at all. libebur128 is a perfectly well working library, in wide use for a long time and probably rather bug-free at this point and it was already possible to use the C implementation from Rust just fine. That’s what the initial versions of the ebur128 crate were doing.

My main reason for doing this was simply because it seemed like a fun little project. It isn’t a lot of code that is changing often so once ported it should be more or less finished and it shouldn’t be much work to stay in sync with the C version. I started thinking about doing this already after the initial release of the C-based ebur128 release, but after reading Joe Neeman’s blog post about porting another C audio library (RNNoise) to Rust this gave me the final push to actually start with porting the code and to follow through until it’s done.

However, don’t go around and ask other people to rewrite their projects in Rust (don’t be rude) or think that your own rewrite is magically going to be much faster and less buggy than the existing implementation. While Rust saves you from a big class of possible bugs, it doesn’t save you from yourself and usually rewrites contain bugs that didn’t exist in the original implementation. Also getting good performance in Rust requires, like in every other language, some effort. Before rewriting any software, think about the goals of this rewrite realistically as well as the effort required to actually get it finished.

Apart from fun there were also a few technical and non-technical reasons for me to look into this. I’m going to just list two here (curiosity and portability). I will skip the usual Rust memory-safety argument as that seems less important with this code: the C code is widely used for a long time, not changing a lot and has easy to follow memory access patterns. While it definitely had a memory safety bug (see above), it was rather difficult to trigger and it was fixed in the meantime.

Curiosity

Personally and at my company Centricular we try to do any new projects where it makes sense in Rust. While this worked very well in the past and we got great results, there were some questions for future projects that I wanted to get some answers, hard data and personal experience for

  • How difficult is it to port a C codebase function by function to Rust while keeping everything working along the way?
  • How difficult is it to get the same or better performance with idiomatic Rust code for low-level media processing code?
  • How much bigger or smaller is the resulting code and do Rust’s higher-level concepts like iterators help to keep code concise?
  • How difficult is it to create a C-compatible library in Rust with the same API and ABI?

I have some answers to all these questions already but previous work on this was not well structured and the results were also not documented, which I’m trying to change here now. Both to have a reference for myself in the future as well as for convincing other people that Rust is a reasonable technology choice for such projects.

As you can see the general pattern of these questions are introducing Rust into an existing codebase, replacing existing components with Rust and writing new components in Rust, which is also relates to my work on the Rust GStreamer bindings.

Portability

C is a very old language and while there is a standard, each compiler has its own quirks and each platform different APIs on top of the bare minimum that the C standard defines. C itself is very portable, but it is not easy to write portable C code, especially when not using a library like GLib that hides these differences and provides basic data structures and algorithms.

This seems to be something that is often forgotten when the portability of C is given as an argument against Rust, and that’s the reason why I wanted to mention this here specifically. While you can get a C compiler basically everywhere, writing C code that also runs well everywhere is another story and C doesn’t make this easy by design. Rust on the other hand makes writing portable code quite easy in my experience.

In practice there were three specific issues I had for this codebase. Most of the advantages of Rust here are because it is a new language and doesn’t have to carry a lot of historical baggage.

Mathematical Constants and Functions

Mathematical constants are not actually part of any C standard. While most compilers just define M_PI (for π), M_E (for 𝖾) and others in math.h nonetheless as they’re defined by POSIX and UNIX98.

Microsoft’s MSVC doesn’t, but instead you have to #define _USE_MATH_DEFINES before including math.h.

While not a big problem per-se, it is annoying and indeed caused the initial version of the ebur128 Rust crate to not compile with MSVC because I forgot about it.

Similarly, which mathematical functions are available depends a lot on the target platform and which version of the C standard is supported. An example of this is the log10 function to calculate the base-10 logarithm. For portability reasons, libebur128 didn’t use it but instead calculated it via the natural logarithm (ln(x) / ln(10) = log10(x)) because it’s only available in POSIX and since C99. While C99 is from 1999, there are still many compilers out there that don’t fully support it, again most prominently MSVC until very recently.

Using log10 instead of going via the natural logarithm is faster and more precise due to floating point number reasons, which is why the Rust implementation uses it but in C it would be required to check at build-time if the function is available or not, which complicates the build process and can easily be forgotten. libebur128 decided to not bother with these complications and simply not use it. Because of that, some conditional code in the Rust implementation is necessary for ensuring that both implementations return the same results in the tests.

Data Structures

libebur128 uses a linked-list-based queue data structure. As the C standard library is very minimal, no collection data structures are included. However on the BSDs and also on Linux with the GNU C library there is one available in sys/queue.h.

Of course MSVC does not have this and other compilers/platforms probably won’t have it either, so libebur128 included a local copy of that queue implementation. Now when building, one has to decide whether there is a system implementation available or otherwise use the internal version. Or simply always use the internal version.

Copying implementations of basic data structures and algorithms into every single project is ugly and error-prone, so let’s maybe not do that. C not having a standardized mechanism for dependency handling doesn’t help with this, which is unfortunately why this is very common in C projects.

One-time Initialization

Thread-safe one-time initialization is another thing that is not defined by the C standard, and depending on your platform there are different APIs available for it or none at all. POSIX again defines one that is widely available, but you can’t really depend on it unconditionally.

This complicates the code and build procedure, so libebur128 simply did not do that and did its one-time initializations of some global arrays every time a new instance was created. Which is probably fine, but a bit wasteful and probably strictly-speaking according to the C standard not actually thread-safe.

The initial version of the ebur128 Rust crate side-stepped this problem by simply doing this initialization once with the API provided by the Rust standard library. See part 2 and part 3 of this blog post for some more details about this.

Easier to Compile and Integrate

A Rust port only requires a Rust compiler, a mixed C/Rust codebase requires at least a C compiler in addition and some kind of build system for the C code.

libebur128 uses CMake, which would be an additional dependency so in the initial version of the ebur128 crate I went via cargo‘s build.rs build scripts and the cc crate as building libebur128 is easy enough. This works but build scripts are problematic for integration of the Rust code into other build systems than cargo.

The Rust port also makes use of conditional compilation in various places. Unlike in C with the preprocessor, non-standardized and inconsistent platform #defines and it being necessary to integrate everything in a custom way into the build system, Rust has a principled and well-designed approach to this problem. This makes it easier to keep the code clean, easier to maintain and more portable.

In addition to build system related simplifications, by not having any C code it is also much easier to compile the code to other targets like WebAssembly, which is natively supported by Rust. It is also possible to compile C to WebAssembly but getting both toolchains to agree with each other and produce compatible code seems not very easy.

Overview

As mentioned above, the code can be found on GitHub and in the ebur128 crate on crates.io.

The current version of the code produces the exact same results as the C version. This is enforced by the quickcheck tests that are running randomized inputs through both versions and check that the results are the same. The code also succeeds all the tests in the EBU loudness test set, so should hopefully be standards compliant as long as the test implementation is not wrong.

Performance-wise the Rust implementation is at least as fast as the C implementation. In some configurations it’s a few percent faster but probably not enough that it actually matters in practice. There are various benchmarks for both versions in different configurations available. The benchmarks are based on the criterion crate, which uses statistical methods to give as accurate as possible results. criterion also generates nice results with graphs for making analysis of the results more pleasant. See part 3 of this blog post for more details.

Writing tests and benchmarks for Rust is so much easier and feels more natural then doing it in C, so the Rust implementation has quite good coverage of the different code paths now. Especially no struggling with build systems was necessary like it would have been in C thanks to cargo and Rust having built-in support. This alone seems to have the potential to cause Rust code having, on average, better quality than similar code written in C.

It is also possible to compile the Rust implementation into a C library with the great cargo-c tool. This easily builds the code as a static/dynamic C library and installs the library, a C header file and also a pkg-config file. With this the Rust implementation is a 100% drop-in replacement of the C libebur128. It is not even necessary to recompile existing code. See part 4 of this blog post for more details.

Dependencies

Apart from the Rust standard library the Rust implementation depends on two other, small and widely used crates. Unlike with C, depending on external dependencies is rather simple with Rust and cargo. The two crates in question are

  • smallvec for a dynamically sized vectors/arrays that can be stored on the stack up to a certain size and only then fall back to heap allocations. This allows to avoid a couple of heap allocations under normal usage.
  • bitflags, which provides a macro for implementing properly typed bitflags. This is used in the constructor of the main type for selecting the features and modes that should be enabled, which directly maps to how the C API works (just with less type-safety).

Unsafe Code

A common question when announcing a Rust port of some C library is how much unsafe code was necessary to reach the same performance as the C code. In this case there are two uses of unsafe code outside the FFI code to call the C implementation in the tests/benchmarks and the C API.

Resampler

The True Peak measurement is using a resampler to upsample the audio signal to a higher sample rate. As part of the most inner loop of the resampler a statically sized ringbuffer is used.

As part of that ringbuffer, explicit indexing of a slice is needed. While the indexes are already manually checked to wrap around when needed, the Rust compiler and LLVM can’t figure that out so additional bounds checks plus panic handling is present in the compiled code. Apart from slowing down the loop with the additional condition, the panic code also causes the whole loop to be optimized less well.

So to get around that, unsafe indexing into the slice is used for performance reasons. While it requires a human now to check the memory safety of the code instead of relying on the compiler, the code in question is simple and small enough that it shouldn’t be a problem in practice.

More on this in part 2 and part 3 of this blog post.

Flushing Denormals to Zero

The other use of unsafe code is in the filter that is applied to the incoming audio signal. On x86/x86-64 the MXCSR register temporarily gets the _MM_FLUSH_ZERO_ON bit set to flush denormal floating point number to zero. That is, denormals (i.e. very small numbers close to zero) as result of any floating point operation are considered as zero.

This happens both for performance reasons as well as correctness reasons. Operations on denormals are generally much slower than on normalized floating point numbers. This has a measurable impact on the performance in this case.

Also as the C library does the same and not flushing denormals to zero would lead to slightly different results. While this difference doesn’t matter in practice as it’s very very small, it would make it harder to compare the results of both implementations as they wouldn’t be as close to each other anymore.

Doing this affects every floating point operation that happens while that bit is set, but because these are only the floating point operations performed by this crate and it’s guaranteed that the bit is unset again (even in case of panics) before leaving the filter, this shouldn’t cause any problems for other code.

Additional Features

Once the C library was ported and performance was comparable to the C implementation, I shortly checked the issues reported on the C library to check if there’s any useful feature requests or bug reports that I could implement / fix in the Rust implementation. There were three, one of which I also wanted for a future project.

None of the new features are available via the C API at this point for compatibility reasons.

Resetting the State

For this one there was a PR already for the C library. Previously the only way to reset all measurements was to create a new instance, which involves new memory allocations, filter initialization, etc..

It’s easy enough to provide a reset method to do only the minimal work required to reset all measurements and restart with a fresh state so I’ve added that to the Rust implementation.

Fix set_max_window() to actually work

This was a bug introduced in the C implementation a while ago in an attempt to prevent integer overflows when calculating sizes of memory allocations, which then would cause memory safety bugs because less memory was allocated than expected. Accidentally this fix restricted the allowed values for the maximum window size too much. There is a PR for fixing this in the C implementation.

On the Rust side this bug also existed because I simply ported over the checks. If I hadn’t ported over the checks, or ported an earlier version without the checks, there fortunately wouldn’t have been any memory safety bug on the Rust side though but instead one of two situations would have happened instead

  1. In debug builds integer overflows cause a panic, so instead of allocating less memory than expected during the setting of the parameters there would’ve been a panic immediately instead of invalid memory accesses later.
  2. In release builds integer overflows simply wrap around for performance reasons. This would’ve caused less memory than expected to be allocated, but later when trying to access the memory there would’ve been a panic when trying to access memory outside the allocated area.

While a panic is also not nice, it at least leads to no undefined behaviour and prevents worse things from happening.

The proper fix in this case was to not restrict the maximum window size statically but to instead check for overflows during the calculations. This is the same the PR for the C implementation does, but on the Rust side this is much easier because of built-in operations like checked_mul for doing an overflow-checking multiplication. In C this requires some rather convoluted code (check the PR for details).

Support for Planar Audio Input

The last additional feature that I implemented was support for planar audio input, for which also a PR to the C implementation exists already.

Most of the time audio signals have the samples of each channel interleaved with each other, so for example for stereo you have an array of samples with the first sample for the left channel, the first sample for the right channel, the second sample for the left channel, etc.. While this representation has some advantages, in other situations it is easier or faster to work with planar audio: the samples of each channel are contiguous one after another, so you have e.g. first all the samples of the left channel one after another and only then all samples of the right channel.

The PR for the C implementation does this with some code duplication of existing macro code (which can be prevented by making the macros more complicated), on the Rust side I implemented this without any code duplication by adding an internal abstraction for interleaved/planar audio and iterating over the samples and then working with that in normal, generic Rust code. This required some minor refactoring and code reorganization but in the end was rather painless. Note that most of the change is addition of new tests and moving some code around.

When looking at the Samples trait, the main part of this refactoring, one might wonder why I used closures instead of Rust iterators for iterating over the samples and the reason is unfortunately performance. More on this in part 3 of this blog post.

Next Part

In the next part of this blog post I will describe the porting approach in detail and also give various examples for how to port C code to idiomatic Rust, and some examples of problems I was running into.

Automatic retry on error and fallback stream handling for GStreamer sources

A very common problem in GStreamer, especially when working with live network streams, is that the source might just fail at some point. Your own network might have problems, the source of the stream might have problems, …

Without any special handling of such situations, the default behaviour in GStreamer is to simply report an error and let the application worry about handling it. The application might for example want to restart the stream, or it might simply want to show an error to the user, or it might want to show a fallback stream instead, telling the user that the stream is currently not available and then seamlessly switch back to the stream once it comes back.

Implementing all of the aforementioned is quite some effort, especially to do it in a robust way. To make it easier for applications I implemented a new plugin called fallbackswitch that contains two elements to automate this.

It is part of the GStreamer Rust plugins and also included in the recent 0.6.0 release, which can also be found on the Rust package (“crate”) repository crates.io.

Installation

For using the plugin you most likely first need to compile it yourself, unless you’re lucky enough that e.g. your Linux distribution includes it already.

Compiling it requires a Rust toolchain and GStreamer 1.14 or newer. The former you can get via rustup for example, if you don’t have it yet, the latter either from your Linux distribution or by using the macOS, Windows, etc binaries that are provided by the GStreamer project. Once that is done, compiling is mostly a matter of running cargo build in the utils/fallbackswitch directory and copying the resulting libgstfallbackswitch.so (or .dll or .dylib) into one of the GStreamer plugin directories, for example ~/.local/share/gstreamer-1.0/plugins.

fallbackswitch

The first of the two elements is fallbackswitch. It acts as a filter that can be placed into any kind of live stream. It consumes one main stream (which must be live) and outputs this stream as-is if everything works well. Based on the timeout property it detects if this main stream didn’t have any activity for the configured amount of time, or everything arrived too late for that long, and then seamlessly switches to a fallback stream. The fallback stream is the second input of the element and does not have to be live (but it can be).

Switching between main stream and fallback stream doesn’t only work for raw audio and video streams but also works for compressed formats. The element will take constraints like keyframes into account when switching, and if necessary/possible also request new keyframes from the sources.

For example to play the Sintel trailer over the network and displaying a test pattern if it doesn’t produce any data, the following pipeline can be constructed:

gst-launch-1.0 souphttpsrc location=https://www.freedesktop.org/software/gstreamer-sdk/data/media/sintel_trailer-480p.webm ! \
    decodebin ! identity sync=true ! fallbackswitch name=s ! videoconvert ! autovideosink \
    videotestsrc ! s.fallback_sink

Note the identity sync=true in the main stream here as we have to convert it to an actual live stream.

Now when running the above command and disconnecting from the network, the video should freeze at some point and after 5 seconds a test pattern should be displayed.

However, when using fallbackswitch the application will still have to take care of handling actual errors from the main source and possibly restarting it. Waiting a bit longer after disconnecting the network with the above command will report an error, which then stops the pipeline.

To make that part easier there is the second element.

fallbacksrc

The second element is fallbacksrc and as the name suggests it is an actual source element. When using it, the main source can be configured via an URI or by providing a custom source element. Internally it then takes care of buffering the source, converting non-live streams into live streams and restarting the source transparently on errors. The various timeouts for this can be configured via properties.

Different to fallbackswitch it also handles audio and video at the same time and demuxes/decodes the streams.

Currently the only fallback streams that can be configured are still images for video. For audio the element will always output silence for now, and if no fallback image is configured for video it outputs black instead. In the future I would like to add support for arbitrary fallback streams, which hopefully shouldn’t be too hard. The basic infrastructure for it is already there.

To use it again in our previous example and having a JPEG image displayed whenever the source does not produce any new data, the following can be done:

gst-launch-1.0 fallbacksrc uri=https://www.freedesktop.org/software/gstreamer-sdk/data/media/sintel_trailer-480p.webm \
    fallback-uri=file:///path/to/some/jpg ! videoconvert ! autovideosink

Now when disconnecting the network, after a while (longer than before because fallbacksrc does additional buffering for non-live network streams) the fallback image should be shown. Different to before, waiting longer will not lead to an error and reconnecting the network causes the video to reappear. However as this is not an actual live-stream, right now playback would again start from the beginning. Seeking back to the previous position would be another potential feature that could be added in the future.

Overall these two elements should make it easier for applications to handle errors in live network sources. While the two elements are still relatively minimal feature-wise, they should already be usable in various real scenarios and are already used in production.

As usual, if you run into any problems or are missing some features, please create an issue in the GStreamer bug tracker.

GStreamer Rust Bindings & Plugins New Releases

It has been quite a while since the last status update for the GStreamer Rust bindings and the GStreamer Rust plugins, so the new releases last week make for a good opportunity to do so now.

Bindings

I won’t write too much about the bindings this time. The latest version as of now is 0.16.1, which means that since I started working on the bindings there were 8 major releases. In that same time there were 45 contributors working on the bindings, which seems quite a lot and really makes me happy.

Just as before, I don’t think any major APIs are missing from the bindings anymore, even for implementing subclasses of the various GStreamer types. The wide usage of the bindings in Free Software projects and commercial products also shows both the interest in writing GStreamer applications and plugins in Rust as well as that the bindings are complete enough and production-ready.

Most of the changes since the last status update involve API cleanups, usability improvements, various bugfixes and addition of minor API that was not included before. The details of all changes can be read in the changelog.

The bindings work with any GStreamer version since 1.8 (released more than 4 years ago), support APIs up to GStreamer 1.18 (to be released soon) and work with Rust 1.40 or newer.

Plugins

The biggest progress probably happened with the GStreamer Rust plugins.

There also was a new release last week, 0.6.0, which was the first release where selected plugins were also uploaded to the Rust package (“crate”) database crates.io. This makes it easy for Rust applications to embed any of these plugins statically instead of depending on them to be available on the system.

Overall there are now 40 GStreamer elements in 18 plugins by 28 contributors available as part of the gst-plugins-rs repository, one tutorial plugin with 4 elements and various plugins in external locations.

These 40 GStreamer elements are the following:

Audio
  • rsaudioecho: Port of the audioecho element from gst-plugins-good
  • rsaudioloudnorm: Live audio loudness normalization element based on the FFmpeg af_loudnorm filter
  • claxondec: FLAC lossless audio codec decoder element based on the pure-Rust claxon implementation
  • csoundfilter: Audio filter that can use any filter defined via the Csound audio programming language
  • lewtondec: Vorbis audio decoder element based on the pure-Rust lewton implementation
Video
  • cdgdec/cdgparse: Decoder and parser for the CD+G video codec based on a pure-Rust CD+G implementation, used for example by karaoke CDs
  • cea608overlay: CEA-608 Closed Captions overlay element
  • cea608tott: CEA-608 Closed Captions to timed-text (e.g. VTT or SRT subtitles) converter
  • tttocea608: CEA-608 Closed Captions from timed-text converter
  • mccenc/mccparse: MacCaption Closed Caption format encoder and parser
  • sccenc/sccparse: Scenarist Closed Caption format encoder and parser
  • dav1dec: AV1 video decoder based on the dav1d decoder implementation by the VLC project
  • rav1enc: AV1 video encoder based on the fast and pure-Rust rav1e encoder implementation
  • rsflvdemux: Alternative to the flvdemux FLV demuxer element from gst-plugins-good, not feature-equivalent yet
  • rsgifenc/rspngenc: GIF/PNG encoder elements based on the pure-Rust implementations by the image-rs project
Text
  • textwrap: Element for line-wrapping timed text (e.g. subtitles) for better screen-fitting, including hyphenation support for some languages
Network
  • reqwesthttpsrc: HTTP(S) source element based on the Rust reqwest/hyper HTTP implementations and almost feature-equivalent with the main GStreamer HTTP source souphttpsrc
  • s3src/s3sink: Source/sink element for the Amazon S3 cloud storage
  • awstranscriber: Live audio to timed text transcription element using the Amazon AWS Transcribe API
Generic
  • sodiumencrypter/sodiumdecrypter: Encryption/decryption element based on libsodium/NaCl
  • togglerecord: Recording element that allows to pause/resume recordings easily and considers keyframe boundaries
  • fallbackswitch/fallbacksrc: Elements for handling potentially failing (network) sources, restarting them on errors/timeout and showing a fallback stream instead
  • threadshare: Set of elements that provide alternatives for various existing GStreamer elements but allow to share the streaming threads between each other to reduce the number of threads
  • rsfilesrc/rsfilesink: File source/sink elements as replacements for the existing filesrc/filesink elements

Live loudness normalization in GStreamer & experiences with porting a C audio filter to Rust

A few months ago I wrote a new GStreamer plugin: an audio filter for live loudness normalization and automatic gain control.

The plugin can be found as part of the GStreamer Rust plugin in the audiofx plugin. It’s also included in the recent 0.6.0 release of the GStreamer Rust plugins and available from crates.io.

Its code is based on Kyle Swanson’s great FFmpeg filter af_loudnorm, about which he wrote some more technical details on his blog a few years back. I’m not going to repeat all that here, if you’re interested in those details and further links please read Kyle’s blog post.

From a very high-level, the filter works by measuring the loudness of the input following the EBU R128 standard with a 3s lookahead, adjusts the gain to reach the target loudness and then applies a true peak limiter with 10ms to prevent any too high peaks to get passed through. Both the target loudness and the maximum peak can be configured via the loudness-target and max-true-peak properties, same as in the FFmpeg filter. Different to the FFmpeg filter I only implemented the “live” mode and not the two-pass mode that is implemented in FFmpeg, which first measures the loudness of the whole stream and then in a second pass adjusts it.

Below I’ll describe the usage of the filter in GStreamer a bit and also some information about the development process, and the porting of the C code to Rust.

Usage

For using the filter you most likely first need to compile it yourself, unless you’re lucky enough that e.g. your Linux distribution includes it already.

Compiling it requires a Rust toolchain and GStreamer 1.8 or newer. The former you can get via rustup for example, if you don’t have it yet, the latter either from your Linux distribution or by using the macOS, Windows, etc binaries that are provided by the GStreamer project. Once that is done, compiling is mostly a matter of running cargo build in the audio/audiofx directory and copying the resulting libgstrsaudiofx.so (or .dll or .dylib) into one of the GStreamer plugin directories, for example ~/.local/share/gstreamer-1.0/plugins.

After that boring part is done, you can use it for example as follows to run loudness normalization on the Sintel trailer:

gst-launch-1.0 playbin \
    uri=https://www.freedesktop.org/software/gstreamer-sdk/data/media/sintel_trailer-480p.webm \
    audio-filter="audioresample ! rsaudioloudnorm ! audioresample ! capsfilter caps=audio/x-raw,rate=48000"

As can be seen above, it is necessary to put audioresample elements around the filter. The reason for that is that the filter currently only works on 192kHz input. This is a simplification for now to make it easier inside the filter to detect true peaks. You would first upsample your audio to 192kHz and then, if needed, later downsample it again to your target sample rate (48kHz in the example above). See the link mentioned before for details about true peaks and why this is generally a good idea to do. In the future the resampling could be implemented internally and maybe optionally the filter could also work with “normal” peak detection on the non-upsampled input.

Apart from that caveat the filter element works like any other GStreamer audio filter and can be placed accordingly in any GStreamer pipeline.

If you run into any problems using the code or it doesn’t work well for your use-case, please create an issue in the GStreamer bugtracker.

The process

As I wrote above, the GStreamer plugin is part of the GStreamer Rust plugins so the first step was to port the FFmpeg C code to Rust. I expected that to be the biggest part of the work, but as writing Rust is simply so much more enjoyable than writing C and I would have to adjust big parts of the code to fit the GStreamer infrastructure anyway, I took this approach nonetheless. The alternative of working based on the C code and writing the plugin in C didn’t seem very appealing to me. In the end, as usual when developing in Rust, this also allowed me to be more confident about the robustness of the result and probably reduced the amount of time spent debugging. Surprisingly, the translation was actually not the biggest part of the work, but instead I had to debug a couple of issues that were already present in the original FFmpeg code and find solutions for them. But more on that later.

The first step for porting the code was to get an implementation of the EBU R128 loudness analysis. In FFmpeg they’re using a fork of the libebur128 C library. I checked if there was anything similar for Rust already, maybe even a pure-Rust implementation of it, but couldn’t find anything. As I didn’t want to write one myself or port the code of the libebur128 C library to Rust, I wrote safe Rust bindings for that library instead. The end result of that can be found on crates.io as an independent crate, in case someone else also needs it for other purposes at some point. The crate also includes the code of the C library, making it as easy as possible to build and include into other projects.

The next step was to actually port the FFmpeg C code to Rust. In the end that was a rather straightforward translation fortunately. The latest version of that code can be found here.

The biggest difference to the C code is the usage of Rust iterators and iterator combinators like zip and chunks_exact. In my opinion this makes the code quite a bit easier to read compared to the manual iteration in the C code together with array indexing, and as a side effect it should also make the code run faster in Rust as it allows to get rid of a lot of array bounds checks.

Apart from that, one part that was a bit inconvenient during that translation and still required manual array indexing is the usage of ringbuffers everywhere in the code. For now I wrote those like I would in C and used a few unsafe operations like get_unchecked to avoid redundant bounds checks, but at a later time I might refactor this into a proper ringbuffer abstraction for such audio processing use-cases. It’s not going to be the last time I need such a data structure. A short search on crates.io gave various results for ringbuffers but none of them seem to provide an API that fits the use-case here. Once that’s abstracted away into a nice data structure, I believe the Rust code of this filter is really nice to read and follow.

Now to the less pleasant parts, and also a small warning to all the people asking for Rust rewrites of everything: of course I introduced a couple of new bugs while translating the code although this was a rather straightforward translation and I tried to be very careful. I’m sure there is also still a bug or two left that I didn’t find while debugging. So always keep in mind that rewriting a project will also involve adding new bugs that didn’t exist in the original code. Or maybe you’re just a better programmer than me and don’t make such mistakes.

Debugging these issues that showed up while testing the code was a good opportunity to also add extensive code comments everywhere so I don’t have to remind myself every time again what this block of code is doing exactly, and it’s something I was missing a bit from the FFmpeg code (it doesn’t have a single comment currently). While writing those comments and explaining the code to myself, I found the majority of these bugs that I introduced and as a side-effect I now have documentation for my future self or other readers of the code.

Fixing these issues I introduced myself wasn’t that time-consuming neither in the end fortunately, but while writing those code comments and also while doing more testing on various audio streams, I found a couple of bugs that already existed in the original FFmpeg C code. Further testing also showed that they caused quite audible distortions on various test streams. These are the bugs that unfortunately took most of the time in the whole process, but at least to my knowledge there are no known bugs left in the code now.

For these bugs in the FFmpeg code I also provided a fix that is merged already, and reported the other two in their bug tracker.

The first one I’d be happy to provide a fix for if my approach is considered correct, but the second one I’ll leave for someone else. Porting over my Rust solution for that one will take some time and getting all the array indexing involved correct in C would require some serious focusing, for which I currently don’t have the time.

Or maybe my solutions to these problems are actually wrong, or my understanding of the original code was wrong and I actually introduced them in my translation, which also would be useful to know.

Overall, while porting the C code to Rust introduced a few new problems that had to be fixed, I would definitely do this again for similar projects in the future. It’s more fun to write and in my opinion the resulting code is easier readable, and better to maintain and extend.

The GTK Rust bindings are not ready yet? Yes they are!

When talking to various people at conferences in the last year or at conferences, a recurring topic was that they believed that the GTK Rust bindings are not ready for use yet.

I don’t know where that perception comes from but if it was true, there wouldn’t have been applications like Fractal, Podcasts or Shortwave using GTK from Rust, or I wouldn’t be able to do a workshop about desktop application development in Rust with GTK and GStreamer at the Linux Application Summit in Barcelona this Friday (code can be found here already) or earlier this year at GUADEC.

One reason I sometimes hear is that there is not support for creating subclasses of GTK types in Rust yet. While that was true, it is not true anymore nowadays. But even more important: unless you want to create your own special widgets, you don’t need that. Many examples and tutorials in other languages make use of inheritance/subclassing for the applications’ architecture, but that’s because it is the idiomatic pattern in those languages. However, in Rust other patterns are more idiomatic and even for those examples and tutorials in other languages it wouldn’t be the one and only option to design applications.

Almost everything is included in the bindings at this point, so seriously consider writing your next GTK UI application in Rust. While some minor features are still missing from the bindings, none of those should prevent you from successfully writing your application.

And if something is actually missing for your use-case or something is not working as expected, please let us know. We’d be happy to make your life easier!

P.S.

Some people are already experimenting with new UI development patterns on top of the GTK Rust bindings. So if you want to try developing an UI application but want to try something different than the usual signal/callback spaghetti code, also take a look at those.

GStreamer Rust bindings 0.12 and GStreamer Plugin 0.3 release

After almost 6 months, a new release of the GStreamer Rust bindings and the GStreamer plugin writing infrastructure for Rust is out. As usual this was coinciding with the release of all the gtk-rs crates to make use of all the new features they contain.

Thanks to all the contributors of both gtk-rs and the GStreamer bindings for all the nice changes that happened over the last 6 months!

And as usual, if you find any bugs please report them and if you have any questions let me know.

GStreamer Bindings

For the full changelog check here.

Most changes this time were internally, especially because many user-facing changes (like Debug impls for various types) were already backported to the minor releases in the 0.11 release series.

WebRTC

The biggest change this time is probably the inclusion of bindings for the GStreamer WebRTC library.

This allows using building all kinds of WebRTC applications outside the browser (or providing a WebRTC implementation for a browser), and while not as full-featured as Google’s own implementation, this interoperates well with the various browsers and generally works much better on embedded devices.

A small example application in Rust is available here.

Serde

Optionally, serde trait implementations for the Serialize and Deserialize trait can be enabled for various fundamental GStreamer types, including caps, buffers, events, messages and tag lists. This allows serializing them into any format that can be handled by serde (which are many!), and deserializing them back to normal Rust structs.

Generic Tag API

Previously only a strongly-typed tag API was exposed that made it impossible to use the wrong data type for a specific tag, e.g. code that tries to store a string for the track number or an integer for the title would simply not compile:

let mut tags = gst::TagList::new();
{
    let tags = tags.get_mut().unwrap();
    tags.add::<Title>(&"some title", gst::TagMergeMode::Append);
    tags.add::<TrackNumber>(&12, gst::TagMergeMode::Append);
}

While this is convenient, it made it rather complicated to work with tag lists if you only wanted to handle them in a generic way. For example by iterating over the tag list and simply checking what kind of tags are available. To solve that, a new generic API was added in addition. This works on glib::Values, which can store any kind of type, and using the wrong type for a specific tag would simply cause an error at runtime instead of compile-time.

let mut tags = gst::TagList::new();
{
    let tags = tags.get_mut().unwrap();
    tags.add_generic(&gst::tags::TAG_TITLE, &"some title", gst::TagMergeMode::Append)
.expect("wrong type for title tag");
    tags.add_generic(&gst::tags::TAG_TRACK_NUMBER, &12, gst::TagMergeMode::Append)
.expect("wrong type for track number tag");
}

This also greatly simplified the serde serialization/deserialization for tag lists.

GStreamer Plugins

For the full changelog check here.

gobject-subclass

The main change this time is that all the generic GObject subclassing infrastructure was moved out of the gst-plugin crate and moved to its own gobject-subclass crate as part of the gtk-rs organization.

As part of this, some major refactoring has happened that allows subclassing more different types but also makes it simpler to add new types. There are also experimental crates for adding some subclassing support to gio and gtk, and a PR for autogenerating part of the code via the gir code generator.

More classes!

The other big addition this time is that it’s now possible to subclass GStreamer Pads and GhostPads, to implement the ChildProxy interface and to subclass the Aggregator and AggregatorPad class.

This now allows to write custom mixer/muxer-style elements (or generally elements that have multiple sink pads) in Rust via the Aggregator base class, and to have custom pad types for elements to allow for setting custom properties on the pads (e.g. to control the opacity of a single video mixer input).

There is currently no example for such an element, but I’ll add a very simple video mixer to the repository some time in the next weeks and will also write a blog post about it for explaining all the steps.

Improving GStreamer performance on a high number of network streams by sharing threads between elements with Rust’s tokio crate

For one of our customers at Centricular we were working on a quite interesting project. Their use-case was basically to receive an as-high-as-possible number of audio RTP streams over UDP, transcode them, and then send them out via UDP again. Due to how GStreamer usually works, they were running into some performance issues.

This blog post will describe the first set of improvements that were implemented for this use-case, together with a minimal benchmark and the results. My colleague Mathieu will follow up with one or two other blog posts with the other improvements and a more full-featured benchmark.

The short version is that CPU usage decreased by about 65-75%, i.e. allowing 3-4x more streams with the same CPU usage. Also parallelization works better and usage of different CPU cores is more controllable, allowing for better scalability. And a fixed, but configurable number of threads is used, which is independent of the number of streams.

The code for this blog post can be found here.

Table of Contents

  1. GStreamer & Threads
  2. Thread-Sharing GStreamer Elements
  3. Available Elements
  4. Little Benchmark
  5. Conclusion

GStreamer & Threads

In GStreamer, by default each source is running from its own OS thread. Additionally, for receiving/sending RTP, there will be another thread in the RTP jitterbuffer, yet another thread for receiving RTCP (another source) and a last thread for sending RTCP at the right times. And RTCP has to be received and sent for the receiver and sender side part of the pipeline, so the number of threads doubles. In the sum this gives at least 1 + 1 + (1 + 1) * 2 = 6 threads per RTP stream in this scenario. In a normal audio scenario, there will be one packet received/sent e.g. every 20ms on each stream, and every now and then an RTCP packet. So most of the time all these threads are only waiting.

Apart from the obvious waste of OS resources (1000 streams would be 6000 threads), this also brings down performance as all the time threads are being woken up. This means that context switches have to happen basically all the time.

To solve this we implemented a mechanism to share threads, and in the end as a result we have a fixed, but configurable number of threads that is independent from the number of streams. And can run e.g. 500 streams just fine on a single thread with a single core, which was completely impossible before. In addition we also did some work to reduce the number of allocations for each packet, so that after startup no additional allocations happen per packet anymore for buffers. See Mathieu’s upcoming blog post for details.

In this blog post, I’m going to write about a generic mechanism for sources, queues and similar elements to share their threads between each other. For the RTP related bits (RTP jitterbuffer and RTCP timer) this was not used due to reuse of existing C codebases.

Thread-Sharing GStreamer Elements

The code in question can be found here, a small benchmark is in the examples directory and it is going to be used for the results later. A full-featured benchmark will come in Mathieu’s blog post.

This is a new GStreamer plugin, written in Rust and around the Tokio crate for asynchronous IO and generally a “task scheduler”.

While this could certainly also have been written in C around something like libuv, doing this kind of work in Rust is simply more productive and fun due to its safety guarantees and the strong type system, which definitely reduced the amount of debugging a lot. And in addition “modern” language features like closures, which make working with futures much more ergonomic.

When using these elements it is important to have full control over the pipeline and its elements, and the dataflow inside the pipeline has to be carefully considered to properly configure how to share threads. For example the following two restrictions should be kept in mind all the time:

  1. Downstream of such an element, the streaming thread must never ever block for considerable amounts of time. Otherwise all other elements inside the same thread-group would be blocked too, even if they could do any work now
  2. This generally all works better in live pipelines, where media is produced in real-time and not as fast as possible

Available Elements

So this repository currently contains the generic infrastructure (see the src/iocontext.rs source file) and a couple of elements:

  • an UDP source: ts-udpsrc, a replacement for udpsrc
  • an app source: ts-appsrc, a replacement for appsrc to inject packets into the pipeline from the application
  • a queue: ts-queue, a replacement for queue that is useful for adding buffering to a pipeline part. The upstream side of the queue will block if not called from another thread-sharing element, but if called from another thread-sharing element it will pause the current task asynchronously. That is, stop the upstream task from producing more data.
  • a proxysink/src element: ts-proxysrc, ts-proxysink, replacements for proxysink/proxysrc for connecting two pipelines with each other. This basically works like the queue, but split into two elements.
  • a tone generator source around spandsp: ts-tonesrc, a replacement for tonegeneratesrc. This also contains some minimal FFI bindings for that part of the spandsp C library.

All these elements have more or less the same API as their non-thread-sharing counterparts.

API-wise, each of these elements has a set of properties for controlling how it is sharing threads with other elements, and with which elements:

  • context: A string that defines in which group this element is. All elements with the same context are running on the same thread or group of threads,
  • context-threads: Number of threads to use in this context. -1 means exactly one thread, 1 and above used N+1 threads (1 thread for polling fds, N worker threads) and 0 sets N to the number of available CPU cores. As long as no considerable work is done in these threads, -1 has shown to be the most efficient. See also this tokio GitHub issue
  • context-wait: Number of milliseconds that the threads will wait on each iteration. This allows to reduce CPU usage even further by handling all events/packets that arrived during that timespan to be handled all at once instead of waking up the thread every time a little event happens, thus reducing context switches again

The elements are all pushing data downstream from a tokio thread whenever data is available, assuming that downstream does not block. If downstream is another thread-sharing element and it would have to block (e.g. a full queue), it instead returns a new future to upstream so that upstream can asynchronously wait on that future before producing more output. By this, back-pressure is implemented between different GStreamer elements without ever blocking any of the tokio threads. All this is implemented around the normal GStreamer data-flow mechanisms, there is no “tokio fast-path” between elements.

Little Benchmark

As mentioned above, there’s a small benchmark application in the examples directory. This basically sets up a configurable number of streams and directly connects them to a fakesink, throwing away all packets. Additionally there is another thread that is sending all these packets. As such, this is really the most basic benchmark and not very realistic but nonetheless it shows the same performance improvement as the real application. Again, see Mathieu’s upcoming blog post for a more realistic and complete benchmark.

When running it, make sure that your user can create enough fds. The benchmark will just abort if not enough fds can be allocated. You can control this with ulimit -n SOME_NUMBER, and allowing a couple of thousands is generally a good idea. The benchmarks below were running with 10000.

After running cargo build –release to build the plugin itself, you can run the benchmark with:

cargo run --release --example udpsrc-benchmark -- 1000 ts-udpsrc -1 1 20

and in another shell the UDP sender with

cargo run --release --example udpsrc-benchmark-sender -- 1000

This runs 1000 streams, uses ts-udpsrc (alternative would be udpsrc), configures exactly one thread -1, 1 context, and a wait time of 20ms. See above for what these settings mean. You can check CPU usage with e.g. top. Testing was done on an Intel i7-4790K, with Rust 1.25 and GStreamer 1.14. One packet is sent every 20ms for each stream.

Source Streams Threads Contexts Wait CPU
udpsrc 1000 1000 x x 44%
ts-udpsrc 1000 -1 1 0 18%
ts-udpsrc 1000 -1 1 20 13%
ts-udpsrc 1000 -1 2 20 15%
ts-udpsrc 1000 2 1 20 16%
ts-udpsrc 1000 2 2 20 27%
Source Streams Threads Contexts Wait CPU
udpsrc 2000 2000 x x 95%
ts-udpsrc 2000 -1 1 20 29%
ts-udpsrc 2000 -1 2 20 31%
Source Streams Threads Contexts Wait CPU
ts-udpsrc 3000 -1 1 20 36%
ts-udpsrc 3000 -1 2 20 47%

Results for 3000 streams for the old udpsrc are not included as starting up that many threads needs too long.

The best configuration is apparently a single thread per context (see this tokio GitHub issue) and waiting 20ms for every iterations. Compared to the old udpsrc, CPU usage is about one third in that setting, and generally it seems to parallelize well. It’s not clear to me why the last test has 11% more CPU with two contexts, while in every other test the number of contexts does not really make a difference, and also not for that many streams in the real test-case.

The waiting does not reduce CPU usage a lot in this benchmark, but on the real test-case it does. The reason is most likely that this benchmark basically sends all packets at once, then waits for the remaining time, then sends the next packets.

Take these numbers with caution, the real test-case in Mathieu’s blog post will show the improvements in the bigger picture, where it was generally a quarter of CPU usage and almost perfect parallelization when increasing the number of contexts.

Conclusion

Generally this was a fun exercise and we’re quite happy with the results, especially the real results. It took me some time to understand how tokio works internally so that I can implement all kinds of customizations on top of it, but for normal usage of tokio that should not be required and the overall design makes a lot of sense to me, as well as the way how futures are implemented in Rust. It requires some learning and understanding how exactly the API can be used and behaves, but once that point is reached it seems like a very productive and performant solution for asynchronous IO. And modelling asynchronous IO problems based on the Rust-style futures seems a nice and intuitive fit.

The performance measurements also showed that GStreamer’s default usage of threads is not always optimal, and a model like in upipe or pipewire (or rather SPA) can provide better performance. But as this also shows, it is possible to implement something like this on top of GStreamer and for the common case, using threads like in GStreamer reduces the cognitive load on the developer a lot.

For a future version of GStreamer, I don’t think we should make the threading “manual” like in these two other projects, but instead provide some API additions that make it nicer to implement thread-sharing elements and to add ways in the GStreamer core to make streaming threads non-blocking. All this can be implemented already, but it could be nicer.

All this “only” improved the number of threads, and thus the threading and context switching overhead. Many other optimizations in other areas are still possible on top of this, for example optimizing receive performance and reducing the number of memory copies inside the pipeline even further. If that’s something you would be interested in, feel free to get in touch.

And with that: Read Mathieu’s upcoming blog posts about the other parts, RTP jitterbuffer / RTCP timer thread sharing, and no allocations, and the full benchmark.

GStreamer Rust bindings 0.11 / plugin writing infrastructure 0.2 release

Following the GStreamer 1.14 release and the new round of gtk-rs releases, there are also new releases for the GStreamer Rust bindings (0.11) and the plugin writing infrastructure (0.2).

Thanks also to all the contributors for making these releases happen and adding lots of valuable changes and API additions.

GStreamer Rust Bindings

The main changes in the Rust bindings were the update to GStreamer 1.14 (which brings in quite some new API, like GstPromise), a couple of API additions (GstBufferPool specifically) and the addition of the GstRtspServer and GstPbutils crates. The former allows writing a full RTSP server in a couple of lines of code (with lots of potential for customizations), the latter provides access to the GstDiscoverer helper object that allows inspecting files and streams for their container format, codecs, tags and all kinds of other metadata.

The GstPbutils crate will also get other features added in the near future, like encoding profile bindings to allow using the encodebin GStreamer element (a helper element for automatically selecting/configuring encoders and muxers) from Rust.

But the biggest changes in my opinion is some refactoring that was done to the Event, Message and Query APIs. Previously you would have to use a view on a newly created query to be able to use the type-specific functions on it

let mut q = gst::Query::new_position(gst::Format::Time);
if pipeline.query(q.get_mut().unwrap()) {
    match q.view() {
        QueryView::Position(ref p) => Some(p.get_result()),
        _ => None,
    }
} else {
    None
}

Now you can directly use the type-specific functions on a newly created query

let mut q = gst::Query::new_position(gst::Format::Time);
if pipeline.query(&mut q) {
    Some(q.get_result())
} else {
    None
}

In addition, the views can now dereference directly to the event/message/query itself and provide access to their API, which simplifies some code even more.

Plugin Writing Infrastructure

While the plugin writing infrastructure did not see that many changes apart from a couple of bugfixes and updating to the new versions of everything else, this does not mean that development on it stalled. Quite the opposite. The existing code works very well already and there was just no need for adding anything new for the projects I and others did on top of it, most of the required API additions were in the GStreamer bindings.

So the status here is the same as last time, get started writing GStreamer plugins in Rust. It works well!