slomo – Page 4 – coaxion.net

Writing GStreamer plugins and elements in Rust

This is part 1, the other parts can be found here: Part 2, part 3 and part 4

This weekend we had the GStreamer Spring Hackfest 2016 in Thessaloniki, my new home town. As usual it was great meeting everybody in person again, having many useful and interesting discussions and seeing what everybody was working on. It seems like everybody was quite productive during these days!

Apart from the usual discussions, cleaning up some of our Bugzilla backlog and getting various patches reviewed, I was working with Luis de Bethencourt on writing a GStreamer plugin with a few elements in Rust. Our goal was to be able to be able to read and write a file, i.e. implement something like the “cp” command around gst-launch-1.0 with just using the new Rust elements, while trying to have as little code written in C as possible and having a more-or-less general Rust API in the end for writing more source and sink elements. That’s all finished, including support for seeking, and I also wrote a small HTTP source.

For the impatient, the code can be found here: https://github.com/sdroege/rsplugin

Why Rust?

Now you might wonder why you would want to go through all the trouble of creating a bridge between GStreamer in C and Rust for writing elements. Other people have written much better texts about the advantages of Rust, which you might want to refer to if you’re interested: The introduction of the Rust documentation, or this free O’Reilly book.

But for myself the main reasons are that

C is a rather antique and inconvenient language if you compare it to more modern languages, and Rust provides a lot of features from higher-level languages while still not introducing all the overhead that is coming with it elsewhere, and
even more important are the safety guarantees of the language, including the strong type system and the borrow checker, which make a whole category of bugs much more unlikely to happen. And thus saves you time during development but also saves your users from having their applications crash on them in the best case, or their precious data being lost or stolen.

Rust is not the panacea for everything, and not even the perfect programming language for every problem, but I believe it has a lot of potential in various areas, including multimedia software where you have to handle lots of complex formats coming from untrusted sources and still need to provide high performance.

I’m not going to write a lot about the details of the language, for that just refer to the website and very well written documentation. But, although being a very young language not developed by a Fortune 500 company (it is developed by Mozilla and many volunteers), it is nowadays being used in production already at places like Dropbox or Firefox (their MP4 demuxer, and in the near future the URL parser). It is also used by Mozilla and Samsung for their experimental, next-generation browser engine Servo.

The Code

Now let’s talk a bit about how it all looks like. Apart from Rust’s standard library (for all the basics and file IO), what we also use are the url crate (Rust’s term for libraries) for parsing and constructing URLs, and the HTTP server/client crate called hyper.

On the C side we have all the boilerplate code for initializing a GStreamer plugin (plugin.c), which then directly calls into Rust code (lib.rs), which then calls back into C (plugin.c) for registering the actual GStreamer elements. The GStreamer elements themselves have then an implementation written in C (rssource.c and rssink.c), which is a normal GObject subclass of GstBaseSrc or GstBaseSink but instead of doing the actual work in C it is just calling into Rust code again. For that to work, some metadata is passed to the GObject class registration, including a function pointer to a Rust function that creates a new instance of the “machinery” of the element. This is then implementing the Source or Sink traits (similar to interfaces) in Rust (rssource.rs and rssink.rs):

pub trait Source: Sync + Send {
    fn set_uri(&mut self, uri_str: Option<&str>) -> bool;
    fn get_uri(&self) -> Option<String>;
    fn is_seekable(&self) -> bool;
    fn get_size(&self) -> u64;
    fn start(&mut self) -> bool;
    fn stop(&mut self) -> bool;
    fn fill(&mut self, offset: u64, data: &mut [u8]) -> Result<usize, GstFlowReturn>;
    fn do_seek(&mut self, start: u64, stop: u64) -> bool;
}

pub trait Sink: Sync + Send {
    fn set_uri(&mut self, uri_str: Option<&str>) -> bool;
    fn get_uri(&self) -> Option<String>;
    fn start(&mut self) -> bool;
    fn stop(&mut self) -> bool;
    fn render(&mut self, data: &[u8]) -> GstFlowReturn;
}

And these traits (plus a constructor) are in the end all that has to be implemented in Rust for the elements (rsfilesrc.rs, rsfilesink.rs and rshttpsrc.rs).

If you look at the code, it’s all still a bit rough at the edges and missing many features (like actual error reporting back to GStreamer instead of printing to stderr), but it already works and the actual implementations of the elements in Rust is rather simple and fun. And even the interfacing with C code is quite convenient at the Rust level.

How to test it?

First of all you need to get Rust and Cargo, check the Rust website or your Linux distribution for details. This was all tested with the stable 1.8 release. And you need GStreamer plus the development files, any recent 1.x version should work.

# clone GIT repository
git clone https://github.com/sdroege/rsplugin
# build it
cd rsplugin
cargo build
# tell GStreamer that there are new plugins in this path
export GST_PLUGIN_PATH=<code>{{EJS0}}</code>
# this dumps the Cargo.toml file to stdout, doing all file IO from Rust
gst-launch-1.0 rsfilesrc uri=file://<code>{{EJS1}}</code>/Cargo.toml ! fakesink dump=1
# this dumps the Rust website to stdout, using the Rust HTTP library hyper
gst-launch-1.0 rshttpsrc uri=https://www.rust-lang.org ! fakesink dump=1
# this basically implements the "cp" command and copies Cargo.toml to a new file called test
gst-launch-1.0 rsfilesrc uri=file://<code>{{EJS2}}</code>/Cargo.toml ! rsfilesink uri=file://<code>{{EJS3}}</code>/test
# this plays Big Buck Bunny via HTTP using rshttpsrc (it has a higher rank than any
# other GStreamer HTTP source currently and is as such used for HTTP URIs)
gst-play-1.0 http://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_480p_h264.mov

What next?

The three implemented elements are not too interesting and were mostly an experiment to see how far we can get in a weekend. But the HTTP source for example, once more features are implemented, could become useful in the long term.

Also, in my opinion, it would make sense to consider using Rust for some categories of elements like parsers, demuxers and muxers, as traditionally these elements are rather complicated and have the biggest exposure to arbitrary data coming from untrusted sources.

And maybe in the very long term, GStreamer or parts of it can be rewritten in Rust. But that’s a lot of effort, so let’s go step by step to see if it’s actually worthwhile and build some useful things on the way there already.

For myself, the next step is going to be to implement something like GStreamer’s caps system in Rust (of which I already have the start of an implementation), which will also be necessary for any elements that handle specific data and not just an arbitrary stream of bytes, and it could probably be also useful for other applications independent of GStreamer.

Issues

The main problem with the current code is that all IO is synchronous. That is, if opening the file, creating a connection, reading data from the network, etc. takes a bit longer it will block until a timeout has happened or the operation finished in one way or another.

Rust currently has no support for non-blocking IO in its standard library, and also no support for asynchronous IO. The latter is being discussed in this RFC though, but it probably will take a while until we see actual results.

While there are libraries for all of this, having to depend on an external library for this is not great as code using different async IO libraries won’t compose well. Without this, Rust is still missing one big and important feature, which will definitely be needed for many applications and the lack of it might hinder adoption of the language.

PTP network clock support in GStreamer

In the last days I was working at Centricular on adding PTP clock support to GStreamer. This is now mostly done, and the results of this work are public but not yet merged into the GStreamer code base. This will need some further testing and code review, see the related bug report here.

You can find the current version of the code here in my freedesktop.org GIT repository. See at the very bottom for some further hints at how you can run it.

So what does that mean, how does it relate to GStreamer?

Precision Time Protocol

PTP is the Precision Time Protocol, which is a network protocol standardized by the IEEE (IEEE1588:2008) to synchronize the clocks between different devices in a network. It’s similar to the better-known Network Time Protocol (NTP, IETF RFC 5905), which is probably used by millions of computers down there to automatically set the local clock. Different to NTP, PTP promises to give much more accurate results, up to microsecond (or even nanosecond with PTP-aware network hardware) precision inside appropriate networks. PTP is part of a few broadcasting and professional media standards, like AES67, RAVENNA, AVB, SMPTE ST 2059-2 and others for inter-device synchronization.

PTP comes in 3 different versions, the old PTPv1 (IEEE1588-2002), PTPv2 (IEEE1588-2008) and IEEE 802.1AS-2011. I’ve implemented PTPv2 via UDPv4 for now, but this work can be extended to other variants later.

GStreamer network synchronization support

So what does that mean for GStreamer? We are now able to synchronize to a PTP clock in the network, which allows multiple devices to accurately synchronize media to the same clock. This is useful in all scenarios where you want to play the same media on different devices, and want them all to be completely synchronized. You can probably imagine quite a few use cases for this yourself now, especially in the context of the “Internet of Things” but also for more normal things like video walls or just having multiple screens display the same thing in the same room.

This was already possible previously with the GStreamer network clock, but that clock implements a custom protocol that only other GStreamer applications can understand currently. See for example here, here or here. With the PTP clock we now get another network clock that speaks a standardized protocol and can interoperate with other software and hardware.

Performance, WiFi and other unreliable networks

When running the code, you will probably notice that PTP works very well in controlled and reliable networks (2-20 microseconds accuracy is what I got here). But it’s not that accurate in wireless networks or in general unreliable networks. It seems like in those networks the custom GStreamer network clock protocol works more reliable currently, partially by design.

Future

As a next step, at Centricular we’re going to look at implementing support for RFC7273 in GStreamer, which allows to signal media clocks for RTP. This is part of e.g. AES67 and RAVENNA and would allow multiple RTP receivers to be perfectly synchronized against a PTP clock without any further configuration. And just for completeness, we’re probably going to release a NTP based GStreamer clock in the near future too.

Running the code

If you want to test my code, you can run it for example against PTPd. And if you want to test the accuracy of the clock, you can measure it with the ptp-clock-reflector (or here, instructions in the README) that I wrote for testing. The latter allows you to measure the accuracy, and in a local wired network I got around 2-20 microseconds accuracy. A GStreamer example application can be found here, which just prints the local and remote PTP clock times. Other than that you can use it just like any other clock on any GStreamer pipeline you can imagine.

Improved GStreamer support for Blackmagic Decklink cards

In the last weeks I started to work on improving the GStreamer support for the Blackmagic Decklink cards. Decklink is Blackmagic’s product line for HDMI, SDI, etc. capture and playback cards, with drivers being available for Linux, Windows and Mac OS X. And on all platforms the same API is provided to access the devices.

GStreamer already had support for Decklink since some time in 2011, the initial plugin was written by David Schleef and seemed to work well enough for quite some time. But over the years people used GStreamer in more dynamic and complex pipelines, and we got a lot of reports recently about the plugin not working properly in such situations. Mostly there were problems with time and synchronization handling, but also other minor issues caused by the elements not using the source / sink base classes (GstBaseSrc and GstBaseSink). The latter was not easily possible because of one device providing audio and video input/output at the same time, so the elements would intuitively need two source or sink pads. And as a side effect this also made it very difficult to use the decklink elements in a standard playback pipeline with playbin.

The rewritten plugin now has separate elements for the audio and video parts of a device, giving 4 different elements in the end (decklinkaudiosrc, decklinkvideosrc, decklinkaudiosink and decklinkvideosink). These are now based on the corresponding base classes, work inside playbin (including pausing and seeking) and also handle synchronization properly. The clock of the hardware is now exposed to the pipeline as a GstClock and even if the pipeline chooses a different clock, the elements will slave their internal clock to the master clock of the pipeline and make sure that synchronization is even kept if the pipeline clock and the Decklink hardware clock are drifting apart. Thanks to GStreamer’s extensive synchronization model and all the functionality that already exists in the GstClock class, this was much easier than it sounds.

The only downside of this approach is, that it is now necessary to always have the video and audio element of one device inside the same pipeline, and also keep their states managed by the pipeline or at least make sure that they both go to PLAYING state together. Also the audio elements won’t work without a corresponding video element, which is a limitation of the hardware. Video only works without problems though.

All the code is available from gst-plugins-bad GIT master.

And now?

So what’s next? Testing, a lot of testing! Especially with different hardware, on different platforms and in all kinds of situations. If you have one of these cards, please test the latest code that is available in gst-plugins-bad GIT master. Please report any bugs you notice in Bugzilla, ideally with information about your hardware and platform and including a GStreamer debug log with GST_DEBUG=decklink*:6.

Apart from that many features are still missing, for example all the different configurations that are available via an API should be exposed on the elements, or support for more audio channels and more video modes. If you need any of these features and want to help, feel free to write patches and provide them in Bugzilla.

Any help would be highly appreciated!

Web Engines Hackfest 2014

During the last days I attended the Web Engines Hackfest 2014 in A Coruña, which was kindly hosted by Igalia in their office. This gave me some time to work again on WebKit, and especially improve and clean up its GStreamer media backend, and even more important of course an opportunity to meet again all the great people working on it.

Apart from various smaller and bigger cleanups, and getting 12 patches merged (thanks to Philippe Normand and Gustavo Noronha reviewing everything immediately), I was working on improving the WebAudio AudioDestination implementation, which allows websites to programmatically generate audio via Javascript and output it. This should be in a much better state now.

But the biggest chunk of work was my attempt for a cleaner and actually working reimplementation of the Media Source Extensions. The Media Source Extensions basically allow a website to provide a container or elementary stream for e.g. the video tag via Javascript, and can for example be used to implement custom streaming protocols. This is still far from finished, but at least it already works on YouTube with their Javascript DASH player and should give a good starting point for finishing the implementation. I hope to have some more time for continuing this work, but any help would be welcome.

Next to the Media Source Extensions, I also spent some time on reviewing a few GStreamer related patches, e.g. the WebAudio AudioSourceProvider implementation by Philippe, or his patch to use GStreamer’s OpenGL library directly in WebKit instead of reimplementing many pieces of it.

I also took a closer look at Servo, Mozilla’s research browser engine written in Rust. It looks like a very promising and well designed project (both, Servo and Rust actually!). I’m sure I’ll play around with Rust in the near future, and will also try to make some time available to work a bit on Servo. Thanks also to Martin Robinson for answering all my questions about Servo and Rust.

And then there were of course lots of discussions about everything!

Good things are going to happen in the future, and WebKit still seems to be a very active project with enthusiastic people 🙂
I hope I’ll be able to visit next year’s Web Engines Hackfest again, it was a lot of fun.

GStreamer remote-controlled testing application for Android, iOS and more

Today I’ve released a little GStreamer testing application, mostly for iOS and Android. You can find it here: gst-launch-remote.

It starts a TCP server on port 9123 and allows to take commands from there, while the main application is a simple video widget and a play/pause button.

You can use it as following (note that every OK or NOK comes from the application to tell you about the success or failure of a command):

$ nc fancydevice 9123
videotestsrc ! autovideosink
OK
+PLAY
OK
+PAUSE
OK
+SEEK 10000
OK
audiotestsrc ! autoaudiosink
OK
+PLAY

Every GStreamer pipeline string is accepted here, but currently only a single video output is supported. And additionally you can also enable sending GStreamer debug output via UDP to some IP/port with:

+DEBUG bigworkstation:12345
OK

This will for now enable GST_LEVEL_DEBUG as debug level for everything. And you can listen for all the output with

$ nc -l -u 12345

Future ideas

Maybe this is something that could be integrated with gst-validate to be able to run the test scenarios on mobile devices, and get decent test coverage for them too.

GStreamer with hardware video codecs on iOS

Update: GIT master of cerbero should compile fine with XCode 6 for x86/x86-64 (simulator) too now

In the last few days I spent some time on getting GStreamer to compile properly with the XCode 6 preview release (which is since today available as a stable release), and make sure everything still works with iOS 8. This should be the case now with GIT master of cerbero.

So much for the boring part. But more important, iOS 8 finally makes the VideoToolbox API available as public API. This allows us to use the hardware video decoders and encoders directly, and opens lots of new possibilities for GStreamer usage on iOS. Before iOS 8 it was only possible to directly decode local files with the hardware decoders via the AVAssetReader API, which of course only allows rather constrained GStreamer usage.

We already had elements (for OS X) using the VideoToolbox API in the applemedia plugin in gst-plugins-bad, so I tried making them work on iOS too. This required quite a few changes, and in the end I rewrote big parts of the encoder element (which should also make it work better on OS X btw). But with GIT master of GStreamer you can now directly use the hardware codecs on iOS 8 by using the vtdec decoder element or the vtenc_h264 encoder element. There’s still a lot of potential for improvements but it’s working.

Notes

If you compile everything from GIT master, it should still be possible to use the same application binary with iOS 7 and earlier versions. Just make sure to use “-weak_framework VideoToolbox” for linking your application instead of “-framework VideoToolbox”. On earlier versions you just won’t be able to use the hardware codecs.

Currently compiling cerbero GIT master for iOS x86 and x86-64 will fail in libffi. Only the ARM variants work. So don’t build with “./cerbero-uninstalled -c config/cross-ios-universal.cbc” but the “cross-ios-arm7.cbc”. And if you need to run bootstrap first, run it from the 1.4 branch for now and then switch back to the master branch. I’m working on fixing that next week.

Concatenate multiple streams gaplessly with GStreamer

Earlier this month I wrote a new GStreamer element that is now integrated into core and will be part of the 1.6 release. It solves yet another commonly asked question on the mailing lists and IRC: How to concatenate multiple streams without gaps between them as if they were a single stream. This is solved by the concat element now.

Here are some examples about how it can be used:

# 100 frames of the SMPTE test pattern, then the ball pattern
gst-launch-1.0 concat name=c ! videoconvert ! videoscale ! autovideosink  videotestsrc num-buffers=100 ! c.   videotestsrc num-buffers=100 pattern=ball ! c.

# Basically: $ cat file1 file2 > both
gst-launch-1.0 concat name=c ! filesink location=both   filesrc location=file1 ! c.   filesrc location=file2 ! c.

# Demuxing two MP4 files with h264 and passing them through the same decoder instance
# Note: this works better if both streams have the same h264 configuration
gst-launch-1.0 concat name=c ! queue ! avdec_h264 ! queue ! videoconvert ! videoscale ! autovideosink   filesrc location=1.mp4 ! qtdemux ! h264parse ! c.   filesrc location=2.mp4 ! qtdemux ! h264parse ! c.

If you run this in an application that also reports time and duration you will see that concat preserves the stream time, i.e. the position reporting goes back to 0 when switching to the next stream and the duration is always the one of the current stream. However the running time will be continuously increasing from stream to stream.

Also as you can notice, this only works for a single stream (i.e. one video stream or one audio stream, not a container stream with audio and video). To gaplessly concatenate multiple streams that contain multiple streams (e.g. one audio and one video track) one after another a more complex pipeline involving multiple concat elements and the streamsynchronizer element will be necessary to keep everything synchronized.

Details

The concat element has request sinkpads, and it concatenates streams in the order in which those sinkpads were requested. All streams except for the currently playing one are blocked until the currently playing one sends an EOS event, and then the next stream will be unblocked. You can request and release sinkpads at any time, and releasing the currently playing sinkpad will cause concat to switch to the next one immediately.

Currently concat only works with segments in GST_FORMAT_TIME and GST_FORMAT_BYTES format, and requires all streams to have the same segment format.

From an application side you could implement the same behaviour as concat implements by using pad probes (waiting for EOS) and using pad offsets (gst_pad_set_offset()) to adjust the running times. But by using the concat element this should be a lot easier to implement.

GStreamer Playback API

Update: the code is now also available on GitHub which probably makes it easier for a few people to use it and contribute. Just send pull requests or create issues in the issue tracker of GitHub.

Over the last years I noticed that I was copying too much code to create simple GStreamer based playback applications. After talking to other people at GUADEC this year it was clear that this wasn’t only a problem on my side but a general problem. So here it is, a convenience API for creating GStreamer based playback applications: GstPlayer.

The API is really simple but is still missing many features:

GstPlayer *  gst_player_new       (void);

void         gst_player_play      (GstPlayer * player);
void         gst_player_pause     (GstPlayer * player);
void         gst_player_stop      (GstPlayer * player);
void         gst_player_seek      (GstPlayer * player, GstClockTime position);
void         gst_player_set_uri   (GstPlayer * player, const gchar * uri);
...

Additionally to that there are a few other properties (which are not only exposed as setters/getters but also as GObject properties), and signals to be notified about position changes, errors, end-of-stream and other useful information. You can find the complete API documentation here. In general the API is modeled similar to other existing APIs like Android’s MediaPlayer and iOS’ AVPlayer.

Included are also some very, very simple commandline, GTK+, Android (including nice Java bindings) and iOS apps. An APK for the Android app can also be found here. It provides no way to start a player, but whenever there is a video or audio file to be played it will be proposed as a possible application via the Android intent system.

In the end the goal is to have a replacement for most of the GStreamer code in e.g. GNOME‘s Totem video player, Enlightenment‘s Emotion or really any other playback application, and then have it integrated in a gst-plugins-base library (or a separate module with other convenience APIs).

While this is all clearly only the start, I hope that people already take a look at this and consider using it for their projects, provide patches, or help making the included sample apps really useful and nice-looking. Apps for other platforms (e.g. a Qt app, or one written in other languages like C# or Python) would also be nice to have. And if you’re an Android or iOS or Qt developer and have no idea about GStreamer you can still help by creating an awesome user interface 🙂 Ideally I would like to get the Android and iOS app into such a good shape that we can upload them to the app stores as useful GStreamer playback applications, which we could then also use to point people to a good demo.

If you’re interested and have some time to work on it or try it, please get in contact with me.

HTTP Adaptive Streaming with GStreamer

Let’s talk a bit about HTTP Adaptive streaming and GStreamer, what it is and how it works. Especially the implementation in GStreamer is not exactly trivial and can be a bit confusing at first sight.

If you’re just interested in knowing if GStreamer supports any HTTP adaptive streaming protocols and which you can stop after this paragraph: yes, and there are currently elements for handling HLS, MPEG DASH and Microsoft SmoothStreaming.

What is it?

So what exactly do I mean when talking about HTTP Adaptive Streaming. There are a few streaming protocols out there that basically work the following way:

You download a Manifest file with metadata about the stream via HTTP. This Manifest contains the location of the actual media, possibly in multiple different bitrates and/or resolutions and/or languages and/or separate audio or subtitle streams or any other different type of variant. It might also contain the location of additional metadata or sub-Manifests that provide more information about a specific variant.
The actual media are also downloaded via HTTP and split into fragments of a specific size, usually 2 seconds to 10 seconds. Depending on the actual protocol these separate fragments can be played standalone or need additional information from the Manifest. The actual media is usually a container format like MPEG TS or a variant of ISO MP4.

Examples of this are HLS, MPEG DASH and Microsoft SmoothStreaming.

This kind of protocols are used for both video-on-demand and “live” treaming, but obviously can’t provide low-latency live streams. When used for live streaming it is usually required to add more than one fragment of latency and then download one or more fragments, reload the playlist to get the location of the next fragments and then download those.

Why?

You might wonder why one would like to implement such a complicated protocol on top of HTTP that can’t even provide low-latency live streaming and why one would choose it over other streaming protocols like RTSP, or any RTP based protocol, or simply serving a stream over HTTP at a single location.

The main reason for this is that these other approaches don’t allow usage of the HTTP based CDNs that are deployed via thousands of servers all around the world, and that they don’t allow using their caching mechanisms. One would need to deploy a specialised CDN just for this other kind of streaming protocol.

A secondary reason is that especially UDP based protocols like RTP but also anything else not based on HTTP can be a bit complicated to deploy because of all the middle boxes (e.g. firewalls, NATs, proxies) that exist in almost any network out there. HTTP(S) generally works everywhere, in the end it’s the only protocol most people conciously use or know about.

And yet another reason is that this splitting of the streams in the fragments allows trivial to implement switching between bitrates or any other stream alternatives at fragment boundaries.

GStreamer client-side design

In GStreamer the above mentioned three protocols are implemented and after trying some different approaches for implementing this kind of protocol, all of them converged to a single design. This is what I’m going to describe now.

Source

The naive approach would consider implementing all this as a single source element. Because that’s how network protocols are usually implemented in GStreamer, right?

While this seems to make sense there’s one problem with this. There is no separate URI scheme defined for such HTTP adaptive streams. They’re all using normal HTTP URIs that point to the Manifest file. And GStreamer chooses the source element that should be used based on the URI scheme alone, and especially not from the data that would be received from that URI.

So what are we left with? HTTP URIs are handled by the standard GStreamer elements for HTTP, and using such a source element gives us a stream containing the Manifest. To make any sense of this collection of bytes and detect the type of it, we additionally have to implement a typefinder that provides the media type based on looking at the data and actually tells us that this is e.g. a MPEG DASH Manifest.

Demuxer

This Manifest stream is usually rather short and followed by the EOS event like every other stream. We now need another element that does something with this Manifest and implements the specific HTTP adaptive streaming protocol. In GStreamer terminology this would act like a demuxer, it has one stream of input and outputs one or more streams based on that. Strictly speaking it’s not really a demuxer though, it does not demultiplex the input stream into the separate streams but that’s just an internal detail in the end.

This demuxer now has to wait until it received the EOS event of the Manifest, then needs to parse it and do whatever the protocol defines. In specific it starts a second thread that handles the control flow of the protocol. This thread downloads any additional resources specified in the Manifest, decides which media fragment(s) are supposed to be downloaded next, starts downloading them and makes sure they leave the demuxer on the source pads in a meaningful way.

The demuxer also has to handle the SEEK event, and based on the time specified in there jump to a different media fragment.

Examples of such demuxers are hlsdemux, dashdemux and mssdemux.

Downloading of data

For downloading additional resources listed in the Manifest that are not actual media fragments (sub-Manifests, reloading the Manifest, headers, encryption keys, …) there is a helper object called GstURIDownloader, which basically provides a blocking (but cancellable) API like this: GstBuffer * buffer = fetch_uri(uri)
Internally it creates a GStreamer source element based on the URI, starts it with that URI, collects all buffers and then returns all of them as a single buffer.

Initially this was also used to download media fragments but it was noticed that this is not ideal. Mostly because the demuxer will have to download a complete fragment before it can be passed downstream, and very huge buffers are passed downstream that cause any buffering
elements to mostly jump between 0% and 100% all the time.

Instead, thanks to the recent work of Thiago Santos, the media fragments are now downloaded differently. The demuxer element is actually a GstBin now and it has a child element that is connected to the source pad: a source element for downloading the media fragments.

This allows to forward the data as it is downloaded immediately and allows downstream elements to handle the data one block after another instead of getting it in multi-second chunks. Especially it also means that playback can start much sooner as you don’t have to wait for a complete fragment but can fill up your buffers with a partial fragment already.

Internally this is a bit tricky as the demuxer will have to catch EOS events from the source (we don’t want to stop streaming just because a fragment is done), catch errors and other messages (maybe instead of forwarding the error we want to retry downloading this media fragment) and switch between different source elements (or just switch the URI of the existing one) during streaming once a media fragment is finished. I won’t describe this here, best to look at the code for that.

Switching between alternatives

Now what happens if the demuxer wants to switch between different alternative streams, e.g. because it has noticed that it can barely keep up with downloading streams of a high bitrate and wants to switch to a lower bitrate. Or event to an alternative that is audio-only. Here the demuxer has to select the next media fragment for the chosen alternative stream and forward that downstream.

But it’s of course not that simple because currently we don’t support renegotiation of decoder pipelines. It could easily happen that codecs change between different alternatives, or the topology changes (e.g. the video stream disappears). Note that not supporting automatic renegotiation for cases like this in decodebin and related elements is not a design deficit of GStreamer but just a limitation of the current implementation.

There already is a similar case that is handled in GStreamer already, which is handling of chained Ogg files (i.e. different Ogg files concatenated to each other). In that case it should also behave like a single stream, but codecs could change or the topology could change. Here the demuxer just has to add new pads for the following streams after having emitted the no-more-pads signal, and then remove all the old pads. decodebin and playbin then first drain the old streams and then handle the new ones, while making sure their times align perfectly with the old ones.

(uri)decodebin and playbin

If we now look at the complete (source) pipeline that is created by uridecodebin for this case we come up with the following (simplified):

We have a source element for the Manifest, which is added by uridecodebin. uridecodebin also uses a typefinder to detect that this is actually an HTTP adapative streaming Manifest. This is then connected to a decodebin instance, which is configured to do buffering after the demuxers with the multiqueue element. In the normal HTTP stream buffering, the buffering is done between the source element and decodebin with the queue2 element.

decodebin then selects an HTTP adaptive streaming protocol demuxer for the current protocol, waits until it has decided on what to output and the connects it to a multiqueue element like every other demuxer. However it uses different buffering settings as this demuxer is going to behave a bit different.

Followed by that is the actual demuxer for the media fragments, which could for example be tsdemux if they’re MPEG TS media fragments. If an elementary stream is used for the media fragment, decodebin will insert a parser for that elementary stream which will chunk the stream into codec frames and also put timing information on them.

Afterwards there comes another multiqueue element, which does not happen after a parser if no HTTP adapative streaming demuxer is used. decodebin will configure this multiqueue element to handle buffering and send buffering messages to the application to allow it to pause playback until the buffer is full.

In playbin and playsink no special handling is necessary. Now let’s take a look at a complete pipeline graph for an HLS stream that contains one video and one audio stream inside MPEG TS containers. Note: this is huge, like all playbin pipelines. Everything on the right half is for converting raw audio/video and postprocessing it, and then outputting it on the display and speakers.

Keep in mind that the audio and video streams, and also subtitle streams, could also be in separate media fragments. In that case the HTTP adaptive streaming demuxer would have multiple source pads, each of them followed by a demuxer or parser and multiqueues. And decodebin would aggregate the buffering messages of each of the multiqueues to give the application a consistent view of the buffering status.

Possible optimisations

Now all of this sounds rather complex and probably slow compared to the naive approach of just implementing all this in a single source element and not having so many different elements involved here. As time as shown it actually is not slow at all, and if you consider a design for
such a protocol in a single element you will notice that all the different components that are separate elements here will also show up in your design. But as GStreamer is like Lego and we like having lots of generic components that are put together to build a more complex whole, the current design seems to follow the idea of GStreamer in a more consistent way. And especially it is possible to reuse lots of existing elements and also allow to replace different elements with custom implements. Transparently due to GStreamer’s autoplugging mechanisms.

So let’s talk about a few possible optimisations here, some of which are implemented in the default GStreamer elements and should be kept in mind when replacing elements. Some of which could be implemented on top of the existing elements.

Keep-alive Connections and other HTTP features

All these HTTP adaptive streaming protocols require to create lots of HTTP requests, which traditionally required to create a new TCP connection for every single request. This involves quite some overhead and increases the latency because of TCP’s handshake protocol. Even worse if you use HTTPS and have to also handle the SSL/TLS handshake protocol on top of that. And we’re talking about multiple 100ms to seconds per connection setup here. HTTP 1.1 allows connections to be kept alive for some period of time and reuse them for multiple HTTP requests. Browsers are using this since a long time already to efficiently show you websites composed of many different files with low latency.

Also previously all GStreamer HTTP source elements closed their connection(s) when going back to the READY state, but it is required to set them to the READY state to switch URIs. Which basically means
that although HTTP 1.1 allows to reuse connections for multiple requests, we were not able to make use of this. Now the souphttpsrc HTTP source element keeps connections open until it goes back to the NULL state if the keep-alive property is set to TRUE, and other HTTP source elements could implement this too. The HTTP adaptive streaming demuxers are making use of this “implicit interface” to reuse connections for multiple requests as much as possible.

Compression

HTTP also defines a way for clients and servers to negotiate the encoding that both support. Especially this allows both to negotiate that the actual data (the response body) should be compressed with gzip (or another method) instead of transferring it as plaintext. For media fragments this is not very useful, for Manifests this can be very useful. Especially in the case of HLS, where the Manifest is a plaintext, ASCII file that can easily be a few 100 kb in size.

The HTTP adaptive streaming demuxers are using another “implicit interface” on the HTTP source element to enable compression (if supported by the server) for Manifest files. This is also currently only handled in the souphttpsrc element.

Other minor features

HTTP defines many other headers, and the HTTP adaptive streaming demuxers make use of two more if supported by the HTTP source element. The “implicit interface” for setting more headers in the HTTP request is the extra-headers property, which can be set to arbitrary headers.

The HTTP adaptive streaming demuxers are currently setting the Referer header to the URI of the Manifest file, which is not mandated by any standard to my knowledge so far but there are streams out there that actually forbid to download media fragments without that. And then the demuxers also set the Cache-Control header to 1) tell caches/proxies to update their internal copy of the Manifest file when redownloading it and b) to tell caches/proxies that some requests must not be cached (if indicated so in the Manifest). The latter can be ignored by the caches of course.

If you implement your own HTTP source element it is probably a good idea to copy the interface of the souphttpsrc element at least for these properties.

Caching HTTP source

Another area that could easily be optimised is implementing a cache for the downloaded media fragments and also Manifests. This is especially useful for video-on-demand streams, and even more when the user likes to seek in the stream. Without any cache it would be required to download all media fragments again after seeking, even if the seek position was already downloaded before.

A simple way for implementing this is a caching HTTP source element. This basically works like an HTTP cache/proxy like Squid, only one level higher. It behaves as if it is an HTTP source element, but actually it does magic inside.

From a conceptual point of view this caching HTTP source element would implement the GstURIHandler interface and handle the HTTP(S) protocols, and ideally also implement some of the properties of souphttpsrc as mentioned above. Internally it would actually be a GstBin that dynamically creates a pipeline for downloading an URI when transitioning from the READY to the PAUSED state. It could have the following internal configurations:

The only tricky bits here are proxying of the different properties to the relevant internal elements, and error handling. You probably don’t want to stop the complete pipeline if for some reason writing to your cache file fails, or reading from your cache file fails. For that you could catch the error messages and also intercept data flow (to ignore error flow returns from filesink), and then dynamically reconfigure the internal pipeline as if nothing has happened.

Based on the Cache-Control header you could implement different functionality, e.g. for refreshing data stored in your cache already. Based on the Referer header you could correlate URIs of media fragments to their corresponding Manifest URI. But how to actually implement the storage, cache invalidation and pruning is a standard problem of computer science and covered elsewhere already.

Creating Streams – The Server Side

And in the end some final words about the server side of things. How to create such HTTP adaptive streams with GStreamer and serve them to your users.

In general this is a relatively simple task with GStreamer and the standard tools and elements that are already available. We have muxers for all the relevant container formats (one for the MP4 variant used in MPEG DASH is available in Bugzilla), there is API to request keyframes at specific positions (so that you can actually start a new media fragment at that position) and the dynamic pipeline mechanisms allow for dynamically switching muxers to start a new media fragment.

On top of that it would only be required to create the Manifest files, and then serve all of this with some HTTP server. For which there are already too many implementations out there, and in the end you want to hide your server behind a CDN anyway.

Additionally there are the hlssink and dashsink elements, the latter one just being in Bugzilla right now. These implement the creation of the media fragments already with the above mentioned GStreamer tools and generate a Manifest file for you.

And then there is the GStreamer Streaming Server, which also has support for HTTP adaptive streaming protocols and can be used to easily serve multiple streams. But that one deserves its own article.

OpenGL support in GStreamer

Over the last few months Matthew Waters, Julien Isorce and to some lesser degree myself worked on integrating proper OpenGL support into GStreamer.

Previously there were a few sinks based on OpenGL (osxvideosink for Mac OS X and eglglessink for Android and iOS), but they all only allowed rendering to a window. They did not allow rendering of a video into a custom texture that is then composited inside the application into an OpenGL scene. And then there was gst-plugins-gl, which allowed more flexible handling of OpenGL inside GStreamer pipelines, including uploading and downloading of video frames to the GPU, provided various filters and base classes to easily implement shader-based filters, provided infrastructure for sharing OpenGL contexts between different elements (even if they run in different threads) and also provided a video sink. The latter was now improved a lot, ported to all the new features for hardware integration and finally merged into gst-plugins-bad. Starting with GStreamer 1.4 in a few weeks, OpenGL will be a first-class citizen in GStreamer pipelines.

After yesterday’s addition of EAGL support for iOS (EAGL is Apple’s iOS API for handling GLES contexts), there is nothing missing to use this new set of library and plugins on all platforms supported by GStreamer. And finally we can get rid of eglglessink, which was only meant as an intermediate solution until we have all the infrastructure for real OpenGL support.