Writing GStreamer Elements in Rust (Part 3): Parsing data from untrusted sources like it’s 2016 – coaxion.net

This is part 3, the older parts can be found here: part 1, part 2 and part 4

And again it took quite a while to write a new update about my experiments with writing GStreamer elements in Rust. The previous articles can be found here and here. Since last time, there was also the GStreamer Conference 2016 in Berlin, where I had a short presentation about this.

Progress was rather slow unfortunately, due to work and other things getting into the way. Let’s hope this improves. Anyway!

There will be three parts again, and especially the last one would be something where I could use some suggestions from more experienced Rust developers about how to solve state handling / state machines in a nicer way. The first part will be about parsing data in general, especially from untrusted sources. The second part will be about my experimental and current proof of concept FLV demuxer.

Parsing Data

Safety?

First of all, you probably all saw a couple of CVEs about security relevant bugs in (rather uncommon) GStreamer elements going around. While all of them would’ve been prevented by having the code written in Rust (due to by-default array bounds checking), that’s not going to be our topic here. They also would’ve been prevented by using various GStreamer helper API, like GstByteReader, GstByteWriter and GstBitReader. So just use those, really. Especially in new code (which is exactly the problem with the code affected by the CVEs, it was old and forgotten). Don’t do an accountant’s job, counting how much money/many bytes you have left to read.

But yes, this is something where Rust will also provide an advantage by having by-default safety features. It’s not going to solve all our problems, but at least some classes of problems. And sure, you can write safe C code if you’re careful but I’m sure you also drive with a seatbelt although you can drive safely. To quote Federico about his motivation for rewriting (parts of) librsvg in Rust:

Every once in a while someone discovers a bug in librsvg that makes it all the way to a CVE security advisory, and it’s all due to using C. We’ve gotten double free()s, wrong casts, and out-of-bounds memory accesses. Recently someone did fuzz-testing with some really pathological SVGs, and found interesting explosions in the library. That’s the kind of 1970s bullshit that Rust prevents.

You can directly replace the word librsvg with GStreamer here.

Ergonomics

The other aspect with parsing data is that it’s usually a very boring aspect of programming. It should be as painless as possible, as easy as possible to do it in a safe way, and after having written your 100th parser by hand you probably don’t want to do that again. Parser combinator libraries like Parsec in Haskell provide a nice alternative. You essentially write down something very close to a formal grammar of the format you want to parse, and out of this comes a parser for the format. Other than parser generators like good, old yacc, everything is written in target language though, and there is no separate code generation step.

Rust, being quite a bit more expressive than C, also made people write parser generator libraries. They are all not as ergonomic (yet?) as in Haskell, but still a big improvement over anything else. There’s nom, combine and chomp. All having a slightly different approach. Choose your favorite. I decided on nom for the time being.

A FLV Demuxer in Rust

For implementing a demuxer, I decided on using the FLV container format. Mostly because it is super-simple compared to e.g. MP4 and WebM, but also because Geoffroy, the author of nom, wrote a simple header parsing library for it already and a prototype demuxer using it for VLC. I’ll have to extend that library for various features in the near future though, if the demuxer should ever become feature-equivalent with the existing one in GStreamer.

As usual, the code can be found here, in the “demuxer” branch. The most relevant files are rsdemuxer.rs and flvdemux.rs.

Following the style of the sources and sinks, the first is some kind of base class / trait for writing arbitrary demuxers in Rust. It’s rather unfinished at this point though, just enough to get something running. All the FLV specific code is in the second file, and it’s also very minimal for now. All it can do is to play one specific file (or hopefully all other files with the same audio/video codec combination).

As part of all this, I also wrote bindings for GStreamer’s buffer abstraction and a Rust-rewrite of the GstAdapter helper type. Both showed Rust’s strengths quite well, the buffer bindings by being able to express various concepts of the buffers in a compiler-checked, safe way in Rust (e.g. ownership, reability/writability), the adapter implementation by being so much shorter (it’s missing features… but still).

So here we are, this can already play one specific file (at least) in any GStreamer based playback application. But some further work is necessary, for which I hopefully have some time in the near future. Various important features are still missing (e.g. other codecs, metadata extraction and seeking), the code is rather proof-of-concept style (stringly-typed media formats, lots of unimplemented!() and .unwrap() calls). But it shows that writing media handling elements in Rust is definitely feasible, and generally seems like a good idea.

If only we had Rust already when all this media handling code in GStreamer was written!

State Handling

Another reason why all this took a bit longer than expected, is that I experimented a bit with expressing the state of the demuxer in a more clever way than what we usually do in C. If you take a look at the GstFlvDemux struct definition in C, it contains about 100 lines of field declarations. Most of them are only valid / useful in specific states that the demuxer is in. Doing the same in Rust would of course also be possible (and rather straightforward), but I wanted to try to do something better, especially by making invalid states unrepresentable.

Rust has this great concept of enums, also known as tagged unions or sum types in other languages. These are not to be confused with C enums or unions, but instead allow multiple variants (like C enums) with fields of various types (like C unions). But all of that in a type-safe way. This seems like the perfect tool for representing complicated state and building a state machine around it.

So much for the theory. Unfortunately, I’m not too happy with the current state of things. It is like this mostly because of Rust’s ownership system getting into my way (rightfully, how would it have known additional constraints I didn’t know how to express?).

Common Parts

The first problem I ran into, was that many of the states have common fields, e.g.

enum State {
    ...
    NeedHeader,
    HaveHeader {header: Header, to_skip: usize },
    Streaming {header: Header, audio: ... },
    ...
}

When writing code that matches on this, and that tries to move from one state to another, these common fields would have to be moved. But unfortunately they are (usually) borrowed by the code already and thus can’t be moved to the new variant. E.g. the following fails to compile

    match self.state {
        ...
        State::HaveHeader {header, to_skip: 0 } => {
            
            self.state = State::Streaming {header: header, ...};
        },
    }

A Tree of States

Repeating the common parts is not nice anyway, so I went with a different solution by creating a tree of states:

enum State {
    ...
    NeedHeader,
    HaveHeader {header: Header, have_header_state: HaveHeaderState },
    ...
}

enum HaveHeaderState {
    Skipping {to_skip: usize },
    Streaming {audio: ... },
}

Apart from making it difficult to find names for all of these, and having relatively deeply nested code, this works

    match self.state {
        ...
        State::HaveHeader {ref header, ref mut have_header_state } => {
            match *have_header_state {
                HaveHeaderState::Skipping { to_skip: 0 } => {
                    *have_header = HaveHeaderState::Streaming { audio: ...};
                }
        },
    }

If you look at the code however, this causes the code to be much bigger than needed and I’m also not sure yet how it will be possible nicely to move “backwards” one state if that situation ever appears. Also there is still the previous problem, although less often: if I would match on to_skip here by reference (or it was no Copy type), the compiler would prevent me from overwriting have_header for the same reasons as before.

So my question for the end: How are others solving this problem? How do you express your states and write the functions around them to modify the states?

Update

I actually implemented the state handling as a State -> State function before (and forgot about that), which seems conceptually the right thing to do. It however has a couple of other problems. Thanks for the suggestions so far, it seems like I’m not alone with this problem at least.

Update 2

I’ve went a bit closer to the C-style struct definition now, as it makes the code less convoluted and allows me to just get forwards with the code. The current status can be seen here now, which also involves further refactoring (and e.g. some support for metadata).

9 thoughts on “Writing GStreamer Elements in Rust (Part 3): Parsing data from untrusted sources like it’s 2016”

iqualfragile says:

November 25, 2016 at 10:36

one posibility would be to not match on the state by reference, but by value, so basically have a block/function with signature
fn(state: State) -> State
while this would in theory move a lot of memory around, i think the compiler should be able to optimize that quite easily.
iirc there is a rfc right now to make struct construction shorter if your variable(s) have the same name as the struct field(s), just like pattern matching. that would also help.

1. slomo says:
  
  November 25, 2016 at 11:09
  
  This makes things nicer, except that it does not work well if you store the state in some other struct (the demuxer in this case). You usually have that other struct mutably referenced. You could then use something like the take_mut crate in theory, but that has (documented) problems with panics (pull request sent)… and in the end the code becomes even more nested and verbose as you have to a) always create a new state in all code paths (you can’t just pass through the state in all no-op cases as it is moved or borrowed into your match branches), and b) because creating a new state is more code than just modifying one field (the changes you mentioned would help with that).
  
  That was actually the very first approach I was trying, completely forgot again. Thanks for reminding!
  
Luke Jones says:

November 27, 2016 at 06:13

Hmm, for a game framework I’m working on, I used something of a “super” class to manage the states, rather than let the states manage themselves.

It was probably the easiest way to get around a state trying to modify states – that’s a bit like an item in a vector trying to modify the vector it is in.

1. slomo says:
  
  November 27, 2016 at 10:22
  
  You mean having a simple state enum that is just enumerating the states, and storing all the other pieces in the same struct that stores this state enum? That’s basically what is done in the C GStreamer code and what I wanted to prevent. It causes you to have lots of fields, of which most of them are useless (and must not be used) in specific states
  
  1. Johan Ouwerkerk says:
    
    November 27, 2016 at 16:46
    
    Could you express it as a combination of pimpl/d-pointer for the states, with an inheritance tree of classes for each State data object? I.e. a ‘State’ comprises a type/state tag + data pointer. The data pointer is of a generic type (or just a box for a generically typed value) which may vary depending on each state, bounded by a supertype. Then you can handle common things via either composition or inheritance in your actual data type implementations.
    
    Not sure if in Rust that sort of indirection blows up the code to the point it is perhaps better to hold your nose and do it the C-way again. 🙂
  2. slomo says:
    
    November 27, 2016 at 18:23
    
    I think at the point you take out traits, things become even more verbose here. But I’m also not sure if I fully understand your proposal. Can you give some example code?
Pingback: Writing GStreamer plugins and elements in Rust – coaxion.net – slomo's blog
Pingback: Writing GStreamer Elements in Rust (Part 2): Don’t panic, we have better assertions now – and other updates – coaxion.net – slomo's blog
Pingback: How to write GStreamer Elements in Rust Part 1: A Video Filter for converting RGB to grayscale – coaxion.net – slomo's blog