Writing GStreamer Elements in Rust (Part 3): Parsing data from untrusted sources like it’s 2016

This is part 3, the older parts can be found here: part 1, part 2 and part 4

And again it took quite a while to write a new update about my experiments with writing GStreamer elements in Rust. The previous articles can be found here and here. Since last time, there was also the GStreamer Conference 2016 in Berlin, where I had a short presentation about this.

Progress was rather slow unfortunately, due to work and other things getting into the way. Let’s hope this improves. Anyway!

There will be three parts again, and especially the last one would be something where I could use some suggestions from more experienced Rust developers about how to solve state handling / state machines in a nicer way. The first part will be about parsing data in general, especially from untrusted sources. The second part will be about my experimental and current proof of concept FLV demuxer.

Parsing Data

Safety?

First of all, you probably all saw a couple of CVEs about security relevant bugs in (rather uncommon) GStreamer elements going around. While all of them would’ve been prevented by having the code written in Rust (due to by-default array bounds checking), that’s not going to be our topic here. They also would’ve been prevented by using various GStreamer helper API, like GstByteReader, GstByteWriter and GstBitReader. So just use those, really. Especially in new code (which is exactly the problem with the code affected by the CVEs, it was old and forgotten). Don’t do an accountant’s job, counting how much money/many bytes you have left to read.

But yes, this is something where Rust will also provide an advantage by having by-default safety features. It’s not going to solve all our problems, but at least some classes of problems. And sure, you can write safe C code if you’re careful but I’m sure you also drive with a seatbelt although you can drive safely. To quote Federico about his motivation for rewriting (parts of) librsvg in Rust:

Every once in a while someone discovers a bug in librsvg that makes it all the way to a CVE security advisory, and it’s all due to using C. We’ve gotten double free()s, wrong casts, and out-of-bounds memory accesses. Recently someone did fuzz-testing with some really pathological SVGs, and found interesting explosions in the library. That’s the kind of 1970s bullshit that Rust prevents.

You can directly replace the word librsvg with GStreamer here.

Ergonomics

The other aspect with parsing data is that it’s usually a very boring aspect of programming. It should be as painless as possible, as easy as possible to do it in a safe way, and after having written your 100th parser by hand you probably don’t want to do that again. Parser combinator libraries like Parsec in Haskell provide a nice alternative. You essentially write down something very close to a formal grammar of the format you want to parse, and out of this comes a parser for the format. Other than parser generators like good, old yacc, everything is written in target language though, and there is no separate code generation step.

Rust, being quite a bit more expressive than C, also made people write parser generator libraries. They are all not as ergonomic (yet?) as in Haskell, but still a big improvement over anything else. There’s nom, combine and chomp. All having a slightly different approach. Choose your favorite. I decided on nom for the time being.

A FLV Demuxer in Rust

For implementing a demuxer, I decided on using the FLV container format. Mostly because it is super-simple compared to e.g. MP4 and WebM, but also because Geoffroy, the author of nom, wrote a simple header parsing library for it already and a prototype demuxer using it for VLC. I’ll have to extend that library for various features in the near future though, if the demuxer should ever become feature-equivalent with the existing one in GStreamer.

As usual, the code can be found here, in the “demuxer” branch. The most relevant files are rsdemuxer.rs and flvdemux.rs.

Following the style of the sources and sinks, the first is some kind of base class / trait for writing arbitrary demuxers in Rust. It’s rather unfinished at this point though, just enough to get something running. All the FLV specific code is in the second file, and it’s also very minimal for now. All it can do is to play one specific file (or hopefully all other files with the same audio/video codec combination).

As part of all this, I also wrote bindings for GStreamer’s buffer abstraction and a Rust-rewrite of the GstAdapter helper type. Both showed Rust’s strengths quite well, the buffer bindings by being able to express various concepts of the buffers in a compiler-checked, safe way in Rust (e.g. ownership, reability/writability), the adapter implementation by being so much shorter (it’s missing features… but still).

So here we are, this can already play one specific file (at least) in any GStreamer based playback application. But some further work is necessary, for which I hopefully have some time in the near future. Various important features are still missing (e.g. other codecs, metadata extraction and seeking), the code is rather proof-of-concept style (stringly-typed media formats, lots of unimplemented!() and .unwrap() calls). But it shows that writing media handling elements in Rust is definitely feasible, and generally seems like a good idea.

If only we had Rust already when all this media handling code in GStreamer was written!

State Handling

Another reason why all this took a bit longer than expected, is that I experimented a bit with expressing the state of the demuxer in a more clever way than what we usually do in C. If you take a look at the GstFlvDemux struct definition in C, it contains about 100 lines of field declarations. Most of them are only valid / useful in specific states that the demuxer is in. Doing the same in Rust would of course also be possible (and rather straightforward), but I wanted to try to do something better, especially by making invalid states unrepresentable.

Rust has this great concept of enums, also known as tagged unions or sum types in other languages. These are not to be confused with C enums or unions, but instead allow multiple variants (like C enums) with fields of various types (like C unions). But all of that in a type-safe way. This seems like the perfect tool for representing complicated state and building a state machine around it.

So much for the theory. Unfortunately, I’m not too happy with the current state of things. It is like this mostly because of Rust’s ownership system getting into my way (rightfully, how would it have known additional constraints I didn’t know how to express?).

Common Parts

The first problem I ran into, was that many of the states have common fields, e.g.

enum State {
    ...
    NeedHeader,
    HaveHeader {header: Header, to_skip: usize },
    Streaming {header: Header, audio: ... },
    ...
}

When writing code that matches on this, and that tries to move from one state to another, these common fields would have to be moved. But unfortunately they are (usually) borrowed by the code already and thus can’t be moved to the new variant. E.g. the following fails to compile

    match self.state {
        ...
        State::HaveHeader {header, to_skip: 0 } => {
            
            self.state = State::Streaming {header: header, ...};
        },
    }

A Tree of States

Repeating the common parts is not nice anyway, so I went with a different solution by creating a tree of states:

enum State {
    ...
    NeedHeader,
    HaveHeader {header: Header, have_header_state: HaveHeaderState },
    ...
}

enum HaveHeaderState {
    Skipping {to_skip: usize },
    Streaming {audio: ... },
}

Apart from making it difficult to find names for all of these, and having relatively deeply nested code, this works

    match self.state {
        ...
        State::HaveHeader {ref header, ref mut have_header_state } => {
            match *have_header_state {
                HaveHeaderState::Skipping { to_skip: 0 } => {
                    *have_header = HaveHeaderState::Streaming { audio: ...};
                }
        },
    }

If you look at the code however, this causes the code to be much bigger than needed and I’m also not sure yet how it will be possible nicely to move “backwards” one state if that situation ever appears. Also there is still the previous problem, although less often: if I would match on to_skip here by reference (or it was no Copy type), the compiler would prevent me from overwriting have_header for the same reasons as before.

So my question for the end: How are others solving this problem? How do you express your states and write the functions around them to modify the states?

Update

I actually implemented the state handling as a State -> State function before (and forgot about that), which seems conceptually the right thing to do. It however has a couple of other problems. Thanks for the suggestions so far, it seems like I’m not alone with this problem at least.

Update 2

I’ve went a bit closer to the C-style struct definition now, as it makes the code less convoluted and allows me to just get forwards with the code. The current status can be seen here now, which also involves further refactoring (and e.g. some support for metadata).