Writing GStreamer Elements in Rust (Part 3): Parsing data from untrusted sources like it’s 2016

This is part 3, the older parts can be found here: part 1 and part 2

And again it took quite a while to write a new update about my experiments with writing GStreamer elements in Rust. The previous articles can be found here and here. Since last time, there was also the GStreamer Conference 2016 in Berlin, where I had a short presentation about this.

Progress was rather slow unfortunately, due to work and other things getting into the way. Let’s hope this improves. Anyway!

There will be three parts again, and especially the last one would be something where I could use some suggestions from more experienced Rust developers about how to solve state handling / state machines in a nicer way. The first part will be about parsing data in general, especially from untrusted sources. The second part will be about my experimental and current proof of concept FLV demuxer.

Parsing Data

Safety?

First of all, you probably all saw a couple of CVEs about security relevant bugs in (rather uncommon) GStreamer elements going around. While all of them would’ve been prevented by having the code written in Rust (due to by-default array bounds checking), that’s not going to be our topic here. They also would’ve been prevented by using various GStreamer helper API, like GstByteReader, GstByteWriter and GstBitReader. So just use those, really. Especially in new code (which is exactly the problem with the code affected by the CVEs, it was old and forgotten). Don’t do an accountant’s job, counting how much money/many bytes you have left to read.

But yes, this is something where Rust will also provide an advantage by having by-default safety features. It’s not going to solve all our problems, but at least some classes of problems. And sure, you can write safe C code if you’re careful but I’m sure you also drive with a seatbelt although you can drive safely. To quote Federico about his motivation for rewriting (parts of) librsvg in Rust:

Every once in a while someone discovers a bug in librsvg that makes it all the way to a CVE security advisory, and it’s all due to using C. We’ve gotten double free()s, wrong casts, and out-of-bounds memory accesses. Recently someone did fuzz-testing with some really pathological SVGs, and found interesting explosions in the library. That’s the kind of 1970s bullshit that Rust prevents.

You can directly replace the word librsvg with GStreamer here.

Ergonomics

The other aspect with parsing data is that it’s usually a very boring aspect of programming. It should be as painless as possible, as easy as possible to do it in a safe way, and after having written your 100th parser by hand you probably don’t want to do that again. Parser combinator libraries like Parsec in Haskell provide a nice alternative. You essentially write down something very close to a formal grammar of the format you want to parse, and out of this comes a parser for the format. Other than parser generators like good, old yacc, everything is written in target language though, and there is no separate code generation step.

Rust, being quite a bit more expressive than C, also made people write parser generator libraries. They are all not as ergonomic (yet?) as in Haskell, but still a big improvement over anything else. There’s nom, combine and chomp. All having a slightly different approach. Choose your favorite. I decided on nom for the time being.

A FLV Demuxer in Rust

For implementing a demuxer, I decided on using the FLV container format. Mostly because it is super-simple compared to e.g. MP4 and WebM, but also because Geoffroy, the author of nom, wrote a simple header parsing library for it already and a prototype demuxer using it for VLC. I’ll have to extend that library for various features in the near future though, if the demuxer should ever become feature-equivalent with the existing one in GStreamer.

As usual, the code can be found here, in the “demuxer” branch. The most relevant files are rsdemuxer.rs and flvdemux.rs.

Following the style of the sources and sinks, the first is some kind of base class / trait for writing arbitrary demuxers in Rust. It’s rather unfinished at this point though, just enough to get something running. All the FLV specific code is in the second file, and it’s also very minimal for now. All it can do is to play one specific file (or hopefully all other files with the same audio/video codec combination).

As part of all this, I also wrote bindings for GStreamer’s buffer abstraction and a Rust-rewrite of the GstAdapter helper type. Both showed Rust’s strengths quite well, the buffer bindings by being able to express various concepts of the buffers in a compiler-checked, safe way in Rust (e.g. ownership, reability/writability), the adapter implementation by being so much shorter (it’s missing features… but still).

So here we are, this can already play one specific file (at least) in any GStreamer based playback application. But some further work is necessary, for which I hopefully have some time in the near future. Various important features are still missing (e.g. other codecs, metadata extraction and seeking), the code is rather proof-of-concept style (stringly-typed media formats, lots of unimplemented!() and .unwrap() calls). But it shows that writing media handling elements in Rust is definitely feasible, and generally seems like a good idea.

If only we had Rust already when all this media handling code in GStreamer was written!

State Handling

Another reason why all this took a bit longer than expected, is that I experimented a bit with expressing the state of the demuxer in a more clever way than what we usually do in C. If you take a look at the GstFlvDemux struct definition in C, it contains about 100 lines of field declarations. Most of them are only valid / useful in specific states that the demuxer is in. Doing the same in Rust would of course also be possible (and rather straightforward), but I wanted to try to do something better, especially by making invalid states unrepresentable.

Rust has this great concept of enums, also known as tagged unions or sum types in other languages. These are not to be confused with C enums or unions, but instead allow multiple variants (like C enums) with fields of various types (like C unions). But all of that in a type-safe way. This seems like the perfect tool for representing complicated state and building a state machine around it.

So much for the theory. Unfortunately, I’m not too happy with the current state of things. It is like this mostly because of Rust’s ownership system getting into my way (rightfully, how would it have known additional constraints I didn’t know how to express?).

Common Parts

The first problem I ran into, was that many of the states have common fields, e.g.

When writing code that matches on this, and that tries to move from one state to another, these common fields would have to be moved. But unfortunately they are (usually) borrowed by the code already and thus can’t be moved to the new variant. E.g. the following fails to compile

A Tree of States

Repeating the common parts is not nice anyway, so I went with a different solution by creating a tree of states:

Apart from making it difficult to find names for all of these, and having relatively deeply nested code, this works

If you look at the code however, this causes the code to be much bigger than needed and I’m also not sure yet how it will be possible nicely to move “backwards” one state if that situation ever appears. Also there is still the previous problem, although less often: if I would match on to_skip here by reference (or it was no Copy type), the compiler would prevent me from overwriting have_header for the same reasons as before.

So my question for the end: How are others solving this problem? How do you express your states and write the functions around them to modify the states?

Update

I actually implemented the state handling as a State -> State function before (and forgot about that), which seems conceptually the right thing to do. It however has a couple of other problems. Thanks for the suggestions so far, it seems like I’m not alone with this problem at least.

Update 2

I’ve went a bit closer to the C-style struct definition now, as it makes the code less convoluted and allows me to just get forwards with the code. The current status can be seen here now, which also involves further refactoring (and e.g. some support for metadata).

Writing GStreamer Elements in Rust (Part 2): Don’t panic, we have better assertions now – and other updates

This is part 2, the other parts can be found here: part 1 and part 3

It’s a while since last article about writing GStreamer plugins in Rust, so here is a short (or not so short?) update of the changes since back then. You might also want to attend my talk at the GStreamer Conference on 10-11 October 2016 in Berlin about this topic.

At this point it’s still only with the same elements as before (HTTP source, file sink and source), but the next step is going to be something more useful (an element processing actual data, parser or demuxer is not decided yet) now that I’m happy with the general infrastructure. You can still find the code in the same place as before on GitHub here, and that’s where also all updates are going to be.

The main sections here will be Error Handling, Threading and Asynchonous IO.

Error Handling & Panics

First of all let’s get started with a rather big change that shows some benefits of Rust over C. There are two types of errors we care about here: expected errors and unexpected errors.

Expected Errors

In GLib based libraries we usually report errors with some kind of boolean return value plus an optional GError that allows to propagate further information about what exactly went wrong to the caller but also to the user. Bindings sometimes convert these directly into exceptions of the target language or whatever construct exists there.

Unfortunately, in GStreamer we use GErrors not very often. Consider for example GstBaseSink (in pseudo-C++/Java/… for simplicity):

For start()/stop() there is just a boolean, for render() there is at least an enum with a few variants. This is for from ideal, so what is additionally required by implementors of those virtual methods is that they post error messages if something goes wrong with further details. Those are propagated out of the normal control flow via the GstBus to the surrounding bins and in the end the application. It would be much nicer if instead we would have GErrors there and make it mandatory for implementors to return one if something goes wrong. These could still be converted to error messages but at a central place then. Something to think about for the next major version of GStreamer.

This is of course only for expected errors, that is, for things where we know that something can go wrong and want to report that.

Rust

In Rust this problem is solved in a similar way, see the huge chapter about error handling in the documentation. You basically return either the successful result, or something very similar to a GError:

Result is the type behind that, and it comes with convenient macros for propagating errors upwards (try!()), chaining multiple failing calls and/or converting errors (map(), and_then(), map_err(), or_else(), etc) and libraries that make defining errors with all the glue code required for combining different errors types from different parts of the code easier.

Similar to Result, there is also Option, which can be Some(x) or None, to signal the possible absence of a value. It works similarly, has similar API, but is generally not meant for error handling. It’s now used instead of GST_CLOCK_TIME_NONE (aka u64::MAX) to signal the absence of e.g. a stop position of a seek, or the absence of a known size of the stream. It’s more explicit then giving a single integer value of all of them a completely different meaning.

How is the different?

The most important difference from my point of view here is, that you must handle errors in one way or another. Otherwise the compiler won’t accept your code. If something can fail, you must explicitly handle this and can’t just silently ignore the possibility of failure. While in C people tend to just ignore error return values and assume that things just went fine.

What’s ErrorMessage and FlowError, what else?

As you probably expect, ErrorMessage maps to the GStreamer concept of error messages and contains exactly the same kind of information. In Rust this is implemented slightly different but in the end results in the same thing. The main difference here is that whenever e.g. start fails, you must provide an error message and can’t just fail silently. That error message can then be used by the caller, and e.g. be posted on the bus (and that’s exactly what happens).

FlowError is basically the negative part (the errors or otherwise non-successful results) of GstFlowReturn:

Similarly, for the actual errors (NotNegotiated and Error), an actual error message must be provided and that then gets used by the caller (and is posted on the bus).

And in the same way, if setting an URI fails we now return a Result<(), UriError>, which then reports the error properly to GStreamer.

In summary, if something goes wrong, we know about that, have to handle/report that and have an error message to post on the bus.

Macros are awesome

As a side-note, creating error messages for GStreamer is not too convenient and they want information like the current source file, line number, function, etc. Like in C, I’ve created a macro to make such an error message. Different to C, macros in Rust are actually awesome though and not just arbitrarily substituting text. Instead they work via pattern matching and allow you to distinguish all kinds of different cases, can be recursive and are somewhat typed (expression vs. statement vs. block of code vs. type name, …).

Unexpected Errors

So this was about expected errors so far, which have to be handled explicitly in Rust but not in C, and for which we have some kind of data structure to pass around. What about the other cases, the errors that will never happen (but usually do sooner or later) because your program would just be completely broken then and all your assumptions wrong, and you wouldn’t know what to do in those cases anyway.

In C with GLib we usually have 3 ways of handling these. 1) Not at all (and crashing, doing something wrong, deadlocking, deleting all your files, …), 2) explicitly asserting what the assumptions in the code are and crashing cleanly otherwise (SIGABRT), or 3) returning some default value from the function but just returning immediately and printing a warning instead of going on.

None of these 3 cases are handleable in any case, which seems fair because they should never happen and if they do we wouldn’t know what to do anyway. 1) is obviously least desirable but the most common, 3) is only slightly better (you get a warning, but usually sooner or later something will crash anyway because you’re in an inconsistent state) and 2) is cleanest. However 2) is nothing you really want either, your application should somehow be able to return back to a clean state if it can (by e.g. storing the current user data, stopping everything and loading up a new UI with the stored user data and some dialog).

Rust

Of course no Rust code should ever run into case 1) above and randomly crash, cause memory corruptions or similar. But of course this will also happen due to bugs in Rust itself, using unsafe code, or code wrapping e.g. a C library. There’s not really anything that can be done about this.

For the other two cases there is however: catching panics. Whenever something goes wrong in unexpected ways, the corresponding Rust code can call the panic!() macro in one way or another. Like via assertions, or when “asserting” that a Result is never the error case by calling unwrap() on it (you don’t have to handle errors but you have to explicitly opt-in to ignore them by calling unwrap()).

What happens from there on is similar to exception handling in other languages (unless you compiled your code so that panics automatically kill the application). The stack gets unwound, everything gets cleaned up on the way, and at some point either everything stops or someone catches that. The boundary for the unwinding is either your main() in Rust, or if the code is called from C, then at that exact point (i.e. for the GStreamer plugins at the point where functions are called from GStreamer).

So what?

At the point where GStreamer calls into the Rust code, we now catch all unwinds that might happen and remember that one happened. This is then converted into a GStreamer error message (so that the application can handle that in a meaningful way) and by remembering that we prevent any further calls into the Rust code and immediately make them error messages too and return.

This allows to keep the inconsistent state inside the element and to allow the application to e.g. remove the element and replace it with something else, restart the pipeline, or do whatever else it wants to do. Assertions are always local to the element and not going to take down the whole application!

Threading

The other major change that happened is that Sink and Source are now single-threaded. There is no reason why the code would have to worry about threading as everything happens in exactly one thread (the streaming thread), except for the setting/getting of the URI (and possibly other “one-time” settings in the future).

To solve that, at the translation layer between C and Rust there is now a (Rust!) wrapper object that handles all the threading (in Rust with Mutexes, which work like the ones in C++, or atomic booleans/integers), stores the URI separately from the Source/Sink and just passes the URI to the start() function. This made the code much cleaner and made it even simpler to write new sources or sinks. No more multi-threading headaches.

I think that we should in general move to such a simpler model in GStreamer and not require a full-fledged, multi-threaded GstElement subclass to be implemented, but instead something more use-case oriented (Source, sink, encoder, decoder, …) that has a single threaded API and hides all the gory details of GstElement. You don’t have to know these in most cases, so you shouldn’t have to know them as is required right now.

Simpler Source/Sink Traits

Overall the two traits look like this now, and that’s all you have to implement for a new source or sink:

Asynchronous IO

The last time I mentioned that a huge missing feature was asynchronous IO, in a composeable way. This has some news now, there’s an abstract implementation for futures and a set of higher-level APIs around mio for doing actual IO, called tokio. Independent of that there’s also futures-cpupool, which allows to call arbitrary calculations as futures on threads of a thread pool.

Recently also the HTTP library Hyper, as used by the HTTP source (and Servo), also got a branch that moves it to tokio for allowing asynchronous IO. Once that is landed, it can relatively easily be used inside the HTTP source for allowing to interrupt HTTP requests at any time.

It seems like this area moves into a very promising direction now, solving my biggest technical concern in a very pleasant way.

Writing GStreamer plugins and elements in Rust

This is part 1, the other parts can be found here: Part 2 and part 3

This weekend we had the GStreamer Spring Hackfest 2016 in Thessaloniki, my new home town. As usual it was great meeting everybody in person again, having many useful and interesting discussions and seeing what everybody was working on. It seems like everybody was quite productive during these days!

Apart from the usual discussions, cleaning up some of our Bugzilla backlog and getting various patches reviewed, I was working with Luis de Bethencourt on writing a GStreamer plugin with a few elements in Rust. Our goal was to be able to be able to read and write a file, i.e. implement something like the “cp” command around gst-launch-1.0 with just using the new Rust elements, while trying to have as little code written in C as possible and having a more-or-less general Rust API in the end for writing more source and sink elements. That’s all finished, including support for seeking, and I also wrote a small HTTP source.

For the impatient, the code can be found here: https://github.com/sdroege/rsplugin

Why Rust?

Now you might wonder why you would want to go through all the trouble of creating a bridge between GStreamer in C and Rust for writing elements. Other people have written much better texts about the advantages of Rust, which you might want to refer to if you’re interested: The introduction of the Rust documentation, or this free O’Reilly book.

But for myself the main reasons are that

  1. C is a rather antique and inconvenient language if you compare it to more modern languages, and Rust provides a lot of features from higher-level languages while still not introducing all the overhead that is coming with it elsewhere, and
  2. even more important are the safety guarantees of the language, including the strong type system and the borrow checker, which make a whole category of bugs much more unlikely to happen. And thus saves you time during development but also saves your users from having their applications crash on them in the best case, or their precious data being lost or stolen.

Rust is not the panacea for everything, and not even the perfect programming language for every problem, but I believe it has a lot of potential in various areas, including multimedia software where you have to handle lots of complex formats coming from untrusted sources and still need to provide high performance.

I’m not going to write a lot about the details of the language, for that just refer to the website and very well written documentation. But, although being a very young language not developed by a Fortune 500 company (it is developed by Mozilla and many volunteers), it is nowadays being used in production already at places like Dropbox or Firefox (their MP4 demuxer, and in the near future the URL parser). It is also used by Mozilla and Samsung for their experimental, next-generation browser engine Servo.

The Code

Now let’s talk a bit about how it all looks like. Apart from Rust’s standard library (for all the basics and file IO), what we also use are the url crate (Rust’s term for libraries) for parsing and constructing URLs, and the HTTP server/client crate called hyper.

On the C side we have all the boilerplate code for initializing a GStreamer plugin (plugin.c), which then directly calls into Rust code (lib.rs), which then calls back into C (plugin.c) for registering the actual GStreamer elements. The GStreamer elements themselves have then an implementation written in C (rssource.c and rssink.c), which is a normal GObject subclass of GstBaseSrc or GstBaseSink but instead of doing the actual work in C it is just calling into Rust code again. For that to work, some metadata is passed to the GObject class registration, including a function pointer to a Rust function that creates a new instance of the “machinery” of the element. This is then implementing the Source or Sink traits (similar to interfaces) in Rust (rssource.rs and rssink.rs):

And these traits (plus a constructor) are in the end all that has to be implemented in Rust for the elements (rsfilesrc.rs, rsfilesink.rs and rshttpsrc.rs).

If you look at the code, it’s all still a bit rough at the edges and missing many features (like actual error reporting back to GStreamer instead of printing to stderr), but it already works and the actual implementations of the elements in Rust is rather simple and fun. And even the interfacing with C code is quite convenient at the Rust level.

How to test it?

First of all you need to get Rust and Cargo, check the Rust website or your Linux distribution for details. This was all tested with the stable 1.8 release. And you need GStreamer plus the development files, any recent 1.x version should work.

What next?

The three implemented elements are not too interesting and were mostly an experiment to see how far we can get in a weekend. But the HTTP source for example, once more features are implemented, could become useful in the long term.

Also, in my opinion, it would make sense to consider using Rust for some categories of elements like parsers, demuxers and muxers, as traditionally these elements are rather complicated and have the biggest exposure to arbitrary data coming from untrusted sources.

And maybe in the very long term, GStreamer or parts of it can be rewritten in Rust. But that’s a lot of effort, so let’s go step by step to see if it’s actually worthwhile and build some useful things on the way there already.

For myself, the next step is going to be to implement something like GStreamer’s caps system in Rust (of which I already have the start of an implementation), which will also be necessary for any elements that handle specific data and not just an arbitrary stream of bytes, and it could probably be also useful for other applications independent of GStreamer.

Issues

The main problem with the current code is that all IO is synchronous. That is, if opening the file, creating a connection, reading data from the network, etc. takes a bit longer it will block until a timeout has happened or the operation finished in one way or another.

Rust currently has no support for non-blocking IO in its standard library, and also no support for asynchronous IO. The latter is being discussed in this RFC though, but it probably will take a while until we see actual results.

While there are libraries for all of this, having to depend on an external library for this is not great as code using different async IO libraries won’t compose well. Without this, Rust is still missing one big and important feature, which will definitely be needed for many applications and the lack of it might hinder adoption of the language.

PTP network clock support in GStreamer

In the last days I was working at Centricular on adding PTP clock support to GStreamer. This is now mostly done, and the results of this work are public but not yet merged into the GStreamer code base. This will need some further testing and code review, see the related bug report here.

You can find the current version of the code here in my freedesktop.org GIT repository. See at the very bottom for some further hints at how you can run it.

So what does that mean, how does it relate to GStreamer?

Precision Time Protocol

PTP is the Precision Time Protocol, which is a network protocol standardized by the IEEE (IEEE1588:2008) to synchronize the clocks between different devices in a network. It’s similar to the better-known Network Time Protocol (NTP, IETF RFC 5905), which is probably used by millions of computers down there to automatically set the local clock. Different to NTP, PTP promises to give much more accurate results, up to microsecond (or even nanosecond with PTP-aware network hardware) precision inside appropriate networks. PTP is part of a few broadcasting and professional media standards, like AES67, RAVENNA, AVB, SMPTE ST 2059-2 and others for inter-device synchronization.

PTP comes in 3 different versions, the old PTPv1 (IEEE1588-2002), PTPv2 (IEEE1588-2008) and IEEE 802.1AS-2011. I’ve implemented PTPv2 via UDPv4 for now, but this work can be extended to other variants later.

GStreamer network synchronization support

So what does that mean for GStreamer? We are now able to synchronize to a PTP clock in the network, which allows multiple devices to accurately synchronize media to the same clock. This is useful in all scenarios where you want to play the same media on different devices, and want them all to be completely synchronized. You can probably imagine quite a few use cases for this yourself now, especially in the context of the “Internet of Things” but also for more normal things like video walls or just having multiple screens display the same thing in the same room.

This was already possible previously with the GStreamer network clock, but that clock implements a custom protocol that only other GStreamer applications can understand currently. See for example here, here or here. With the PTP clock we now get another network clock that speaks a standardized protocol and can interoperate with other software and hardware.

Performance, WiFi and other unreliable networks

When running the code, you will probably notice that PTP works very well in controlled and reliable networks (2-20 microseconds accuracy is what I got here). But it’s not that accurate in wireless networks or in general unreliable networks. It seems like in those networks the custom GStreamer network clock protocol works more reliable currently, partially by design.

Future

As a next step, at Centricular we’re going to look at implementing support for RFC7273 in GStreamer, which allows to signal media clocks for RTP. This is part of e.g. AES67 and RAVENNA and would allow multiple RTP receivers to be perfectly synchronized against a PTP clock without any further configuration. And just for completeness, we’re probably going to release a NTP based GStreamer clock in the near future too.

Running the code

If you want to test my code, you can run it for example against PTPd. And if you want to test the accuracy of the clock, you can measure it with the ptp-clock-reflector (or here, instructions in the README) that I wrote for testing. The latter allows you to measure the accuracy, and in a local wired network I got around 2-20 microseconds accuracy. A GStreamer example application can be found here, which just prints the local and remote PTP clock times. Other than that you can use it just like any other clock on any GStreamer pipeline you can imagine.

Improved GStreamer support for Blackmagic Decklink cards

In the last weeks I started to work on improving the GStreamer support for the Blackmagic Decklink cards. Decklink is Blackmagic’s product line for HDMI, SDI, etc. capture and playback cards, with drivers being available for Linux, Windows and Mac OS X. And on all platforms the same API is provided to access the devices.

GStreamer already had support for Decklink since some time in 2011, the initial plugin was written by David Schleef and seemed to work well enough for quite some time. But over the years people used GStreamer in more dynamic and complex pipelines, and we got a lot of reports recently about the plugin not working properly in such situations. Mostly there were problems with time and synchronization handling, but also other minor issues caused by the elements not using the source / sink base classes (GstBaseSrc and GstBaseSink). The latter was not easily possible because of one device providing audio and video input/output at the same time, so the elements would intuitively need two source or sink pads. And as a side effect this also made it very difficult to use the decklink elements in a standard playback pipeline with playbin.

The rewritten plugin now has separate elements for the audio and video parts of a device, giving 4 different elements in the end (decklinkaudiosrc, decklinkvideosrc, decklinkaudiosink and decklinkvideosink). These are now based on the corresponding base classes, work inside playbin (including pausing and seeking) and also handle synchronization properly. The clock of the hardware is now exposed to the pipeline as a GstClock and even if the pipeline chooses a different clock, the elements will slave their internal clock to the master clock of the pipeline and make sure that synchronization is even kept if the pipeline clock and the Decklink hardware clock are drifting apart. Thanks to GStreamer’s extensive synchronization model and all the functionality that already exists in the GstClock class, this was much easier than it sounds.

The only downside of this approach is, that it is now necessary to always have the video and audio element of one device inside the same pipeline, and also keep their states managed by the pipeline or at least make sure that they both go to PLAYING state together. Also the audio elements won’t work without a corresponding video element, which is a limitation of the hardware. Video only works without problems though.

All the code is available from gst-plugins-bad GIT master.

And now?

So what’s next? Testing, a lot of testing! Especially with different hardware, on different platforms and in all kinds of situations. If you have one of these cards, please test the latest code that is available in gst-plugins-bad GIT master. Please report any bugs you notice in Bugzilla, ideally with information about your hardware and platform and including a GStreamer debug log with GST_DEBUG=decklink*:6.

Apart from that many features are still missing, for example all the different configurations that are available via an API should be exposed on the elements, or support for more audio channels and more video modes. If you need any of these features and want to help, feel free to write patches and provide them in Bugzilla.

Any help would be highly appreciated!

Web Engines Hackfest 2014

During the last days I attended the Web Engines Hackfest 2014 in A Coruña, which was kindly hosted by Igalia in their office. This gave me some time to work again on WebKit, and especially improve and clean up its GStreamer media backend, and even more important of course an opportunity to meet again all the great people working on it.

Apart from various smaller and bigger cleanups, and getting 12 patches merged (thanks to Philippe Normand and Gustavo Noronha reviewing everything immediately), I was working on improving the WebAudio AudioDestination implementation, which allows websites to programmatically generate audio via Javascript and output it. This should be in a much better state now.

But the biggest chunk of work was my attempt for a cleaner and actually working reimplementation of the Media Source Extensions. The Media Source Extensions basically allow a website to provide a container or elementary stream for e.g. the video tag via Javascript, and can for example be used to implement custom streaming protocols. This is still far from finished, but at least it already works on YouTube with their Javascript DASH player and should give a good starting point for finishing the implementation. I hope to have some more time for continuing this work, but any help would be welcome.

Next to the Media Source Extensions, I also spent some time on reviewing a few GStreamer related patches, e.g. the WebAudio AudioSourceProvider implementation by Philippe, or his patch to use GStreamer’s OpenGL library directly in WebKit instead of reimplementing many pieces of it.

I also took a closer look at Servo, Mozilla’s research browser engine written in Rust. It looks like a very promising and well designed project (both, Servo and Rust actually!). I’m sure I’ll play around with Rust in the near future, and will also try to make some time available to work a bit on Servo. Thanks also to Martin Robinson for answering all my questions about Servo and Rust.

And then there were of course lots of discussions about everything!

Good things are going to happen in the future, and WebKit still seems to be a very active project with enthusiastic people 🙂
I hope I’ll be able to visit next year’s Web Engines Hackfest again, it was a lot of fun.

GStreamer remote-controlled testing application for Android, iOS and more

Today I’ve released a little GStreamer testing application, mostly for iOS and Android. You can find it here: gst-launch-remote.

It starts a TCP server on port 9123 and allows to take commands from there, while the main application is a simple video widget and a play/pause button.

You can use it as following (note that every OK or NOK comes from the application to tell you about the success or failure of a command):

Every GStreamer pipeline string is accepted here, but currently only a single video output is supported. And additionally you can also enable sending GStreamer debug output via UDP to some IP/port with:

This will for now enable GST_LEVEL_DEBUG as debug level for everything. And you can listen for all the output with

Future ideas

Maybe this is something that could be integrated with gst-validate to be able to run the test scenarios on mobile devices, and get decent test coverage for them too.

GStreamer with hardware video codecs on iOS

Update: GIT master of cerbero should compile fine with XCode 6 for x86/x86-64 (simulator) too now

In the last few days I spent some time on getting GStreamer to compile properly with the XCode 6 preview release (which is since today available as a stable release), and make sure everything still works with iOS 8. This should be the case now with GIT master of cerbero.

So much for the boring part. But more important, iOS 8 finally makes the VideoToolbox API available as public API. This allows us to use the hardware video decoders and encoders directly, and opens lots of new possibilities for GStreamer usage on iOS. Before iOS 8 it was only possible to directly decode local files with the hardware decoders via the AVAssetReader API, which of course only allows rather constrained GStreamer usage.

We already had elements (for OS X) using the VideoToolbox API in the applemedia plugin in gst-plugins-bad, so I tried making them work on iOS too. This required quite a few changes, and in the end I rewrote big parts of the encoder element (which should also make it work better on OS X btw). But with GIT master of GStreamer you can now directly use the hardware codecs on iOS 8 by using the vtdec decoder element or the vtenc_h264 encoder element. There’s still a lot of potential for improvements but it’s working.

Notes

If you compile everything from GIT master, it should still be possible to use the same application binary with iOS 7 and earlier versions. Just make sure to use “-weak_framework VideoToolbox” for linking your application instead of “-framework VideoToolbox”. On earlier versions you just won’t be able to use the hardware codecs.

Currently compiling cerbero GIT master for iOS x86 and x86-64 will fail in libffi. Only the ARM variants work. So don’t build with “./cerbero-uninstalled -c config/cross-ios-universal.cbc” but the “cross-ios-arm7.cbc”. And if you need to run bootstrap first, run it from the 1.4 branch for now and then switch back to the master branch. I’m working on fixing that next week.

Concatenate multiple streams gaplessly with GStreamer

Earlier this month I wrote a new GStreamer element that is now integrated into core and will be part of the 1.6 release. It solves yet another commonly asked question on the mailing lists and IRC: How to concatenate multiple streams without gaps between them as if they were a single stream. This is solved by the concat element now.

Here are some examples about how it can be used:

If you run this in an application that also reports time and duration you will see that concat preserves the stream time, i.e. the position reporting goes back to 0 when switching to the next stream and the duration is always the one of the current stream. However the running time will be continuously increasing from stream to stream.

Also as you can notice, this only works for a single stream (i.e. one video stream or one audio stream, not a container stream with audio and video). To gaplessly concatenate multiple streams that contain multiple streams (e.g. one audio and one video track) one after another a more complex pipeline involving multiple concat elements and the streamsynchronizer element will be necessary to keep everything synchronized.

Details

The concat element has request sinkpads, and it concatenates streams in the order in which those sinkpads were requested. All streams except for the currently playing one are blocked until the currently playing one sends an EOS event, and then the next stream will be unblocked. You can request and release sinkpads at any time, and releasing the currently playing sinkpad will cause concat to switch to the next one immediately.

Currently concat only works with segments in GST_FORMAT_TIME and GST_FORMAT_BYTES format, and requires all streams to have the same segment format.

From an application side you could implement the same behaviour as concat implements by using pad probes (waiting for EOS) and using pad offsets (gst_pad_set_offset()) to adjust the running times. But by using the concat element this should be a lot easier to implement.

GStreamer Playback API

Update: the code is now also available on GitHub which probably makes it easier for a few people to use it and contribute. Just send pull requests or create issues in the issue tracker of GitHub.

Over the last years I noticed that I was copying too much code to create simple GStreamer based playback applications. After talking to other people at GUADEC this year it was clear that this wasn’t only a problem on my side but a general problem. So here it is, a convenience API for creating GStreamer based playback applications: GstPlayer.

The API is really simple but is still missing many features:

Additionally to that there are a few other properties (which are not only exposed as setters/getters but also as GObject properties), and signals to be notified about position changes, errors, end-of-stream and other useful information. You can find the complete API documentation here. In general the API is modeled similar to other existing APIs like Android’s MediaPlayer and iOS’ AVPlayer.

Included are also some very, very simple commandline, GTK+, Android (including nice Java bindings) and iOS apps. An APK for the Android app can also be found here. It provides no way to start a player, but whenever there is a video or audio file to be played it will be proposed as a possible application via the Android intent system.

In the end the goal is to have a replacement for most of the GStreamer code in e.g. GNOME‘s Totem video player, Enlightenment‘s Emotion or really any other playback application, and then have it integrated in a gst-plugins-base library (or a separate module with other convenience APIs).

While this is all clearly only the start, I hope that people already take a look at this and consider using it for their projects, provide patches, or help making the included sample apps really useful and nice-looking. Apps for other platforms (e.g. a Qt app, or one written in other languages like C# or Python) would also be nice to have. And if you’re an Android or iOS or Qt developer and have no idea about GStreamer you can still help by creating an awesome user interface 🙂 Ideally I would like to get the Android and iOS app into such a good shape that we can upload them to the app stores as useful GStreamer playback applications, which we could then also use to point people to a good demo.

If you’re interested and have some time to work on it or try it, please get in contact with me.