How to write GStreamer Elements in Rust Part 1: A Video Filter for converting RGB to grayscale

This is part one of a series of blog posts that I’ll write in the next weeks, as previously announced in the GStreamer Rust bindings 0.10.0 release blog post. Since the last series of blog posts about writing GStreamer plugins in Rust ([1] [2] [3] [4]) a lot has changed, and the content of those blog posts has only historical value now, as the journey of experimentation to what exists now.

In this first part we’re going to write a plugin that contains a video filter element. The video filter can convert from RGB to grayscale, either output as 8-bit per pixel grayscale or 32-bit per pixel RGB. In addition there’s a property to invert all grayscale values, or to shift them by up to 255 values. In the end this will allow you to watch Big Bucky Bunny, or anything else really that can somehow go into a GStreamer pipeline, in grayscale. Or encode the output to a new video file, send it over the network via WebRTC or something else, or basically do anything you want with it.

Big Bucky Bunny – Grayscale

This will show the basics of how to write a GStreamer plugin and element in Rust: the basic setup for registering a type and implementing it in Rust, and how to use the various GStreamer API and APIs from the Rust standard library to do the processing.

The final code for this plugin can be found here, and it is based on the 0.1 version of the gst-plugin crate and the 0.10 version of the gstreamer crate. At least Rust 1.20 is required for all this. I’m also assuming that you have GStreamer (at least version 1.8) installed for your platform, see e.g. the GStreamer bindings installation instructions.

Table of Contents

  1. Project Structure
  2. Plugin Initialization
  3. Type Registration
  4. Type Class & Instance Initialization
  5. Caps & Pad Templates
  6. Caps Handling Part 1
  7. Caps Handling Part 2
  8. Conversion of BGRx Video Frames to Grayscale
  9. Testing the new element
  10. Properties
  11. What next?

Project Structure

We’ll create a new cargo project with cargo init –lib –name gst-plugin-tutorial. This will create a basically empty Cargo.toml and a corresponding src/ We will use this structure: will contain all the plugin related code, separate modules will contain any GStreamer plugins that are added.

The empty Cargo.toml has to be updated to list all the dependencies that we need, and to define that the crate should result in a cdylib, i.e. a C library that does not contain any Rust-specific metadata. The final Cargo.toml looks as follows

We’re depending on the gst-plugin crate, which provides all the basic infrastructure for implementing GStreamer plugins and elements. In addition we depend on the gstreamer, gstreamer-base and gstreamer-video crates for various GStreamer API that we’re going to use later, and the glib crate to be able to use some GLib API that we’ll need. GStreamer is building upon GLib, and this leaks through in various places.

With the basic project structure being set-up, we should be able to compile the project with cargo build now, which will download and build all dependencies and then creates a file called target/debug/ (or .dll on Windows, .dylib on macOS). This is going to be our GStreamer plugin.

To allow GStreamer to find our new plugin and make it available in every GStreamer-based application, we could install it into the system- or user-wide GStreamer plugin path or simply point the GST_PLUGIN_PATH environment variable to the directory containing it:

If you now run the gst-inspect-1.0 tool on the, it will not yet print all information it can extract from the plugin but for now just complains that this is not a valid GStreamer plugin. Which is true, we didn’t write any code for it yet.

Plugin Initialization

Let’s start editing src/ to make this an actual GStreamer plugin. First of all, we need to add various extern crate directives to be able to use our dependencies and also mark some of them #[macro_use] because we’re going to use macros defined in some of them. This looks like the following

Next we make use of the plugin_define! macro from the gst-plugin crate to set-up the static metadata of the plugin (and make the shared library recognizeable by GStreamer to be a valid plugin), and to define the name of our entry point function (plugin_init) where we will register all the elements that this plugin provides.

This is unfortunately not very beautiful yet due to a) GStreamer requiring this information to be statically available in the shared library, not returned by a function (starting with GStreamer 1.14 it can be a function), and b) Rust not allowing raw strings (b”blabla) to be concatenated with a macro like the std::concat macro (so that the b and \0 parts could be hidden away). Expect this to become better in the future.

The static plugin metadata that we provide here is

  1. name of the plugin
  2. short description for the plugin
  3. name of the plugin entry point function
  4. version number of the plugin
  5. license of the plugin (only a fixed set of licenses is allowed here, see)
  6. source package name
  7. binary package name (only really makes sense for e.g. Linux distributions)
  8. origin of the plugin
  9. release date of this version

In addition we’re defining an empty plugin entry point function that just returns true

With all that given, gst-inspect-1.0 should print exactly this information when running on the file (or .dll on Windows, or .dylib on macOS)

Type Registration

As a next step, we’re going to add another module rgb2gray to our project, and call a function called register from our plugin_init function.

With that our src/ is complete, and all following code is only in src/ At the top of the new file we first need to add various use-directives to import various types and functions we’re going to use into the current module’s scope

GStreamer is based on the GLib object system (GObject). C (just like Rust) does not have built-in support for object orientated programming, inheritance, virtual methods and related concepts, and GObject makes these features available in C as a library. Without language support this is a quite verbose endeavour in C, and the gst-plugin crate tries to expose all this in a (as much as possible) Rust-style API while hiding all the details that do not really matter.

So, as a next step we need to register a new type for our RGB to Grayscale converter GStreamer element with the GObject type system, and then register that type with GStreamer to be able to create new instances of it. We do this with the following code

This defines a zero-sized struct Rgb2GrayStatic that is used to implement the ImplTypeStatic<BaseTransform> trait on it for providing static information about the type to the type system. In our case this is a zero-sized struct, but in other cases this struct might contain actual data (for example if the same element code is used for multiple elements, e.g. when wrapping a generic codec API that provides support for multiple decoders and then wanting to register one element per decoder). By implementing ImplTypeStatic<BaseTransform> we also declare that our element is going to be based on the GStreamer BaseTransform base class, which provides a relatively simple API for 1:1 transformation elements like ours is going to be.

ImplTypeStatic provides functions that return a name for the type, and functions for initializing/returning a new instance of our element (new) and for initializing the class metadata (class_init, more on that later). We simply let those functions proxy to associated functions on the Rgb2Gray struct that we’re going to define at a later time.

In addition, we also define a register function (the one that is already called from our plugin_init function) and in there first register the Rgb2GrayStatic type metadata with the GObject type system to retrieve a type ID, and then register this type ID to GStreamer to be able to create new instances of it with the name “rsrgb2gray” (e.g. when using gst::ElementFactory::make).

Type Class & Instance Initialization

As a next step we declare the Rgb2Gray struct and implement the new and class_init functions on it. In the first version, this struct is almost empty but we will later use it to store all state of our element.

In the new function we return a boxed (i.e. heap-allocated) version of our struct, containing a newly created GStreamer debug category of name “rsrgb2gray”. We’re going to use this debug category later for making use of GStreamer’s debug logging system for logging the state and changes of our element.

In the class_init function we, again, set up some metadata for our new element. In this case these are a description, a classification of our element, a longer description and the author. The metadata can later be retrieved and made us of via the Registry and PluginFeature/ElementFactory API. We also configure the BaseTransform class and define that we will never operate in-place (producing our output in the input buffer), and that we don’t want to work in passthrough mode if the input/output formats are the same.

Additionally we need to implement various traits on the Rgb2Gray struct, which will later be used to override virtual methods of the various parent classes of our element. For now we can keep the trait implementations empty. There is one trait implementation required per parent class.

With all this defined, gst-inspect-1.0 should be able to show some more information about our element already but will still complain that it’s not complete yet.

Caps & Pad Templates

Data flow of GStreamer elements is happening via pads, which are the input(s) and output(s) (or sinks and sources) of an element. Via the pads, buffers containing actual media data, events or queries are transferred. An element can have any number of sink and source pads, but our new element will only have one of each.

To be able to declare what kinds of pads an element can create (they are not necessarily all static but could be created at runtime by the element or the application), it is necessary to install so-called pad templates during the class initialization. These pad templates contain the name (or rather “name template”, it could be something like src_%u for e.g. pad templates that declare multiple possible pads), the direction of the pad (sink or source), the availability of the pad (is it always there, sometimes added/removed by the element or to be requested by the application) and all the possible media types (called caps) that the pad can consume (sink pads) or produce (src pads).

In our case we only have always pads, one sink pad called “sink”, on which we can only accept RGB (BGRx to be exact) data with any width/height/framerate and one source pad called “src”, on which we will produce either RGB (BGRx) data or GRAY8 (8-bit grayscale) data. We do this by adding the following code to the class_init function.

The names “src” and “sink” are pre-defined by the BaseTransform class and this base-class will also create the actual pads with those names from the templates for us whenever a new element instance is created. Otherwise we would have to do that in our new function but here this is not needed.

If you now run gst-inspect-1.0 on the rsrgb2gray element, these pad templates with their caps should also show up.

Caps Handling Part 1

As a next step we will add caps handling to our new element. This involves overriding 4 virtual methods from the BaseTransformImpl trait, and actually storing the configured input and output caps inside our element struct. Let’s start with the latter

We define a new struct State that contains the input and output caps, stored in a VideoInfo. VideoInfo is a struct that contains various fields like width/height, framerate and the video format and allows to conveniently with the properties of (raw) video formats. We have to store it inside a Mutex in our Rgb2Gray struct as this can (in theory) be accessed from multiple threads at the same time.

Whenever input/output caps are configured on our element, the set_caps virtual method of BaseTransform is called with both caps (i.e. in the very beginning before the data flow and whenever it changes), and all following video frames that pass through our element should be according to those caps. Once the element is shut down, the stop virtual method is called and it would make sense to release the State as it only contains stream-specific information. We’re doing this by adding the following to the BaseTransformImpl trait implementation

This code should be relatively self-explanatory. In set_caps we’re parsing the two caps into a VideoInfo and then store this in our State, in stop we drop the State and replace it with None. In addition we make use of our debug category here and use the gst_info! and gst_debug! macros to output the current caps configuration to the GStreamer debug logging system. This information can later be useful for debugging any problems once the element is running.

Next we have to provide information to the BaseTransform base class about the size in bytes of a video frame with specific caps. This is needed so that the base class can allocate an appropriately sized output buffer for us, that we can then fill later. This is done with the get_unit_size virtual method, which is required to return the size of one processing unit in specific caps. In our case, one processing unit is one video frame. In the case of raw audio it would be the size of one sample multiplied by the number of channels.

We simply make use of the VideoInfo API here again, which conveniently gives us the size of one video frame already.

Instead of get_unit_size it would also be possible to implement the transform_size virtual method, which is getting passed one size and the corresponding caps, another caps and is supposed to return the size converted to the second caps. Depending on how your element works, one or the other can be easier to implement.

Caps Handling Part 2

We’re not done yet with caps handling though. As a very last step it is required that we implement a function that is converting caps into the corresponding caps in the other direction. For example, if we receive BGRx caps with some width/height on the sinkpad, we are supposed to convert this into new caps with the same width/height but BGRx or GRAY8. That is, we can convert BGRx to BGRx or GRAY8. Similarly, if the element downstream of ours can accept GRAY8 with a specific width/height from our source pad, we have to convert this to BGRx with that very same width/height.

This has to be implemented in the transform_caps virtual method, and looks as following

This caps conversion happens in 3 steps. First we check if we got caps for the source pad. In that case, the caps on the other pad (the sink pad) are going to be exactly the same caps but no matter if the caps contained BGRx or GRAY8 they must become BGRx as that’s the only format that our sink pad can accept. We do this by creating a clone of the input caps, then making sure that those caps are actually writable (i.e. we’re having the only reference to them, or a copy is going to be created) and then iterate over all the structures inside the caps and then set the “format” field to BGRx. After this, all structures in the new caps will be with the format field set to BGRx.

Similarly, if we get caps for the sink pad and are supposed to convert it to caps for the source pad, we create new caps and in there append a copy of each structure of the input caps (which are BGRx) with the format field set to GRAY8. In the end we append the original caps, giving us first all caps as GRAY8 and then the same caps as BGRx. With this ordering we signal to GStreamer that we would prefer to output GRAY8 over BGRx.

In the end the caps we created for the other pad are filtered against optional filter caps to reduce the potential size of the caps. This is done by intersecting the caps with that filter, while keeping the order (and thus preferences) of the filter caps (gst::CapsIntersectMode::First).

Conversion of BGRx Video Frames to Grayscale

Now that all the caps handling is implemented, we can finally get to the implementation of the actual video frame conversion. For this we start with defining a helper function bgrx_to_gray that converts one BGRx pixel to a grayscale value. The BGRx pixel is passed as a &[u8] slice with 4 elements and the function returns another u8 for the grayscale value.

This function works by extracting the blue, green and red components from each pixel (remember: we work on BGRx, so the first value will be blue, the second green, the third red and the fourth unused), extending it from 8 to 32 bits for a wider value-range and then converts it to the Y component of the YUV colorspace (basically what your grandparents’ black & white TV would’ve displayed). The coefficients come from the Wikipedia page about YUV and are normalized to unsigned 16 bit integers so we can keep some accuracy, don’t have to work with floating point arithmetic and stay inside the range of 32 bit integers for all our calculations. As you can see, the green component is weighted more than the others, which comes from our eyes being more sensitive to green than to other colors.

Afterwards we have to actually call this function on every pixel. For this the transform virtual method is implemented, which gets a input and output buffer passed and we’re supposed to read the input buffer and fill the output buffer. The implementation looks as follows, and is going to be our biggest function for this element

What happens here is that we first of all lock our state (the input/output VideoInfo) and error out if we don’t have any yet (which can’t really happen unless other elements have a bug, but better safe than sorry). After that we map the input buffer readable and the output buffer writable with the VideoFrameRef API. By mapping the buffers we get access to the underlying bytes of them, and the mapping operation could for example make GPU memory available or just do nothing and give us access to a normally allocated memory area. We have access to the bytes of the buffer until the VideoFrameRef goes out of scope.

Instead of VideoFrameRef we could’ve also used the gst::Buffer::map_readable() and gst::Buffer::map_writable() API, but different to those the VideoFrameRef API also extracts various metadata from the raw video buffers and makes them available. For example we can directly access the different planes as slices without having to calculate the offsets ourselves, or we get directly access to the width and height of the video frame.

After mapping the buffers, we store various information we’re going to need later in local variables to save some typing later. This is the width (same for input and output as we never changed the width in transform_caps), the input and out (row-) stride (the number of bytes per row/line, which possibly includes some padding at the end of each line for alignment reasons), the output format (which can be BGRx or GRAY8 because of how we implemented transform_caps) and the pointers to the first plane of the input and output (which in this case also is the only plane, BGRx and GRAY8 both have only a single plane containing all the RGB/gray components).

Then based on whether the output is BGRx or GRAY8, we iterate over all pixels. The code is basically the same in both cases, so I’m only going to explain the case where BGRx is output.

We start by iterating over each line of the input and output, and do so by using the chunks iterator to give us chunks of as many bytes as the (row-) stride of the video frame is, do the same for the other frame and then zip both iterators together. This means that on each iteration we get exactly one line as a slice from each of the frames and can then start accessing the actual pixels in each line.

To access the individual pixels in each line, we again use the chunks iterator the same way, but this time to always give us chunks of 4 bytes from each line. As BGRx uses 4 bytes for each pixel, this gives us exactly one pixel. Instead of iterating over the whole line, we only take the actual sub-slice that contains the pixels, not the whole line with stride number of bytes containing potential padding at the end. Now for each of these pixels we call our previously defined bgrx_to_gray function and then fill the B, G and R components of the output buffer with that value to get grayscale output. And that’s all.

Using Rust high-level abstractions like the chunks iterators and bounds-checking slice accesses might seem like it’s going to cause quite some performance penalty, but if you look at the generated assembly most of the bounds checks are completely optimized away and the resulting assembly code is close to what one would’ve written manually (especially when using the newly-added exact_chunks iterators). Here you’re getting safe and high-level looking code with low-level performance!

You might’ve also noticed the various assertions in the processing function. These are there to give further hints to the compiler about properties of the code, and thus potentially being able to optimize the code better and moving e.g. bounds checks out of the inner loop and just having the assertion outside the loop check for the same. In Rust adding assertions can often improve performance by allowing further optimizations to be applied, but in the end always check the resulting assembly to see if what you did made any difference.

Testing the new element

Now we implemented almost all functionality of our new element and could run it on actual video data. This can be done now with the gst-launch-1.0 tool, or any application using GStreamer and allowing us to insert our new element somewhere in the video part of the pipeline. With gst-launch-1.0 you could run for example the following pipelines

Note that you will likely want to compile with cargo build –release and add the target/release directory to GST_PLUGIN_PATH instead. The debug build might be too slow, and generally the release builds are multiple orders of magnitude (!) faster.


The only feature missing now are the properties I mentioned in the opening paragraph: one boolean property to invert the grayscale value and one integer property to shift the value by up to 255. Implementing this on top of the previous code is not a lot of work. Let’s start with defining a struct for holding the property values and defining the property metadata.

This should all be rather straightforward: we define a Settings struct that stores the two values, implement the Default trait for it, then define a two-element array with property metadata (names, description, ranges, default value, writability), and then store the default value of our Settings struct inside another Mutex inside the element struct.

In the next step we have to make use of these: we need to tell the GObject type system about the properties, and we need to implement functions that are called whenever a property value is set or get.

Property values can be changed from any thread at any time, that’s why the Mutex is needed here to protect our struct. And we’re using a new mutex to be able to have it locked only for the shorted possible amount of time: we don’t want to keep it locked for the whole time of the transform function, otherwise applications trying to set/get values would block for up to one frame.

In the property setter/getter functions we are working with a glib::Value. This is a dynamically typed value type that can contain values of any type, together with the type information of the contained value. Here we’re using it to handle an unsigned integer (u32) and a boolean for our two properties. To know which property is currently set/get, we get an identifier passed which is the index into our PROPERTIES array. We then simply match on the name of that to decide which property was meant

With this implemented, we can already compile everything, see the properties and their metadata in gst-inspect-1.0 and can also set them on gst-launch-1.0 like this

If we set GST_DEBUG=rsrgb2gray:6 in the environment before running that, we can also see the corresponding debug output when the values are changing. The only thing missing now is to actually make use of the property values for the processing. For this we add the following changes to bgrx_to_gray and the transform function

And that’s all. If you run the element in gst-launch-1.0 and change the values of the properties you should also see the corresponding changes in the video output.

Note that we always take a copy of the Settings struct at the beginning of the transform function. This ensures that we take the mutex only the shorted possible amount of time and then have a local snapshot of the settings for each frame.

Also keep in mind that the usage of the property values in the bgrx_to_gray function is far from optimal. It means the addition of another condition to the calculation of each pixel, thus potentially slowing it down a lot. Ideally this condition would be moved outside the inner loops and the bgrx_to_gray function would made generic over that. See for example this blog post about “branchless Rust” for ideas how to do that, the actual implementation is left as an exercise for the reader.

What next?

I hope the code walkthrough above was useful to understand how to implement GStreamer plugins and elements in Rust. If you have any questions, feel free to ask them here in the comments.

The same approach also works for audio filters or anything that can be handled in some way with the API of the BaseTransform base class. You can find another filter, an audio echo filter, using the same approach here.

In the next blog post in this series I’ll show how to use another base class to implement another kind of element, but for the time being you can also check the GIT repository for various other element implementations.

One thought on “How to write GStreamer Elements in Rust Part 1: A Video Filter for converting RGB to grayscale”

Leave a Reply

Your email address will not be published. Required fields are marked *