The never-ending story: GStreamer and hardware integration

This is the companion article for my talk at the GStreamer Conference 2013: The never-ending story: GStreamer and hardware integration

The topic of hardware integration into GStreamer (think of GPUs, DSPs, hardware codecs, OpenMAX IL, OpenGL, VAAPI, video4linux, etc.) was always a tricky one. In the past, in the 0.10 release series, it often was impossible to make full and efficient use of special hardware features and many ugly hacks were required. In 1.x we reconsidered many of these problems and tried to adjust the existing APIs, or add new APIs to make it possible to solve these problems in a cleaner way. Although all these APIs are properly documented in detail, what’s missing is some kind of high-level/overview documentation to understand how it all fits together, which API to use when for what. That’s what I’m trying to solve here now.

What’s new in GStreamer 1.0 & 1.2

The relevant changes in GStreamer 1.x for hardware integration are the much better memory management and control with the GstMemory and GstAllocator abstractions and GstBufferPool, the support for arbitrary buffer metadata with GstMeta, sharing of device contexts between elements with GstContext and the reworked (re-)negotiation and (re-)configuration mechanisms, especially the new ALLOCATION query. In the following I will explain how to use them in the context of hardware integration and how they all work together and will assume that you already know and understand the basics of GStreamer.

Special memory

The most common problem when dealing with special hardware and the APIs to use it is, that it requires the usage of special memory. Maybe there are some constraints on the memory (e.g. regarding alignment or physical contiguous memory is required), or you need to use some special API to use the memory and it doesn’t behave like normal system memory. Examples for the second case would be VAAPI and OpenGL, where you have to use special APIs to do something with the memory, or the new DMA-BUF Linux kernel API, where you just pass around a file descriptor that might be mappable to the process’ virtual address space via mmap().

Custom memory types

In 0.10 special memory was usually handled by subclassing GstBuffer, and somehow ensuring that you could get a buffer of the correct type an it’s not copied to a normal GstBuffer in between. It was very unreliable and rather hackish usually.

In 1.0 it’s no longer possible to subclass GstBuffer, and GstBuffer is only a collection of memory abstraction objects and metadata. You would now implement a custom GstMemory and GstAllocator. GstMemory here is just representing an instance or handle to a specific memory but itself has no logic implemented. For the logic it would point to its corresponding GstAllocator. The reason for this split is mainly that a) there might be multiple allocators for a single memory type (think of the different ways of allocating an EGLImage), and b) that you want to negotiate the allocator and memory type between elements before allocating any of these memories.

This new abstraction of memory has explicit operations to request read, write or exclusive access to the memory and for usage like normal system memory requires an explicit call to gst_memory_map(). This mapping to system memory allows to use it from C like memory allocated via malloc(), but might not be possible for every type of memory. Subclasses of GstMemory could have to implement all these operations and map them to whatever this specific type of memory provides, and could provide additional API on top of that. The memory could also be marked as read-only, or could even be inaccessible for the process and just passed around as an opaque handle between elements.

Let me give some examples. For DMA-BUF you would for example implement the mapping to the virtual address space of the process via mmap() (which is done by GstDmabufMemory), for VAAPI you could implement additional API for the memory that could provide the memory as an GL texture id or convert it to a different color format (which is done by gst-vaapi), for memory that represents a GL texture id you could implement mapping to the process’ virtual address space by dispatching an operation to the GL thread and using glTexImage2D() or glReadPixels() and implement copying of memory in a similar way (which is done by gst-plugins-gl). You could also implement some kind of access control for the memory, which would allow easier implementation of DRM schemes (all engineers are probably dying a bit inside while reading this now…), or do crazy things like implementing some kind of file-backed memory.

With GstBuffer just being a collection of memories now, it provides convenience API to handle all the contained memories, e.g. to copy all of them, or map all of them to the process’ virtual address space.

Memory constraints

If all you need is just some memory that behaves like normal, malloc’d memory but fulfils some additional constraints like alignment or being physically contiguous you don’t need to implement your own GstMemory and GstAllocator subclass. The allocation function of GstAllocator, gst_allocator_alloc(), allows to provide parameters to specify such things and the default memory implementations handles this just fine.

For format specific constraints, like having a special rowstride of raw video frames, or having their planes in separate memory regions, something else can be done which I will mention later when talking about arbitrary buffer metadata.

So far I’m not aware of any special type of memory that can’t be handled with the new API, and I think it’s rather unlikely that there ever will be one. In the worst case it will just be a bit inconvenient to write the code.

Buffer pools

In many cases it will be useful to not just allocate a new block of memory whenever one is required, but instead keep track of memory in a pool and recycle memory that is currently unused. For example because allocations are just too expensive or because there is only a very limited number of memory blocks available anyway.

For this GstBufferPool is available. It provides a generic buffer pool class with standard functionality (setting of min/max number of buffers, pre-allocation of buffers, allocation parameters, allow/forbid dynamic allocation of buffers, selection of the GstAllocator, etc) and can be extended by subclasses with custom API and configuration.

As the most common case for buffer pools is for raw video, there’s also a GstVideoBufferPool subclass which already implements a lot of common functionality. For example it cares for allocating memory of the right size and caps handling, allows for configuration of custom rowstrides or using separate memory regions for the different planes and implements a custom configuration interface for specifying additional padding around each plane (GstVideoAlignment).

Arbitrary buffer metadata

In GStreamer 1.0 you can attach GstMeta to a buffer to represent arbitrary metadata, which can be used for all kinds of different things. GstMeta provides generic descriptions about the aspects of a buffer it applies to (tags) and functions to transform the meta (e.g. if a filter changes the aspect the meta applies too). The GstMeta supported by elements can also be negotiated between them to make sure all elements in a chain support a specific kind of metadata if it needs to care about it.

Examples for GstMeta usage would be the additional of simple metadata to buffers. Metadata like defining a rectangle with region of interests on a video frame (e.g. for face detection), or defining the gamma transfer function that applies to a video frame or defining custom rowstrides, or defining special downmix matrixes for converting 5.1 audio to stereo. While these are all examples that mostly apply to raw audio and video, it is also possible to implement metas that describe compressed formats, e.g. a meta that provides an MPEG sequence header as parsed C structures.

Apart from simple metadata it is also possible to use GstMeta for delayed processing, for example if all elements in a chain support a cropping metadata, it would be possible to not crop the video frame in the decoder but instead let the video sink handle the cropping based on a cropping rectangle put on the buffer as metadata. Or subtitles could be put on a buffer as metadata instead of overlaying them on top of the frame, and only in the sink after all other processing the subtitles would be overlaid on top of the video frame.

And then it’s also possible to use GstMeta to define “dynamic interfaces” on buffers. The metadata could for example contain a function pointer to a function that allows to upload the buffer to a GL texture, which could be simply implemented via glTexImage2() for normal memory or via special VAAPI for memory that is backed by VAAPI memory.

All the examples given in the previous paragraphs are already implemented by GStreamer and can just be re-used by other elements and we plan to add more generic GstMeta in the future. Look for GstVideoMeta, GstVideoCropMeta, GstVideoGLTextureUploadMeta, GstVideoOverlayMeta and GstAudioDownmixMeta for example.

Do not use GstMeta for memory specific information, e.g. as a lazy way to get around implementing a custom GstMemory and GstAllocator. It won’t work as optimal!

Negotiation

Compared to 0.10, negotiation between elements was extended by another step. There’s a) caps negotiation and b) allocation negotiation. Each of these happens before data-flow and after receiving a RECONFIGURE event.

caps negotiation

During caps negotiation the pads of the elements decide on one specific format that is supported by both linked pads and also make it possible for further downstream elements to decide on a specific format. Caps in GStreamer represent such a specific format and its properties, and is represented in a mime-type similar way with properties, e.g. “video/x-h264,width=1920,height=1080,profile=main”.

GstCaps can contain a list of such formats, represented as GstStructures, and each of these structures contain a name and a list of fields. The values of the fields can either be concrete, single values or describing multiple values (e.g. by using integer ranges). On top of caps many generic operations are defined to check compatibility of caps, to create intersections, to merge caps or to convert them into a single, fixed format (which is required during negotiation). If multiple structures are contained in a caps, they are ordered by preference.

All this is nothing new, but in GStreamer 1.2 caps were extended by GstCapsFeatures, which allows to add additional constraints to caps. For example a specific memory type can be requested or the existence of specific metadata. This is used during negotiation to make sure only compatible elements are linked (e.g. ones that can only work with EGLImage memory would not be seen as compatible with ones that only can handle VAAPI memory although everything else of the caps is the same), and is also used in autoplugging elements like playbin to select the most optimal decoder and sink combinations. An example of such caps with caps features would be “video/x-raw(memory:EGLImage),width=1920,height=1080” to specify that EGLImage memory is required.

So, the actual negotiation of the caps happens from upstream to downstream. The most downstream element uses a CAPS query, which replaces the getcaps function of pads from 0.10 and has an additional filter argument in 1.0. The filter argument is used to tell downstream which caps are actually supported by upstream and in which preference, so it don’t have to create unnecessary caps. This query is then propagated further downstream to the next elements, translated if necessary (e.g. from raw video caps to h264 caps by an encoder, while still proxying the values of the width/height fields), then answered by the most downstream element that doesn’t propagate the query further downstream and filled with the caps supported by this element’s pad. On the way back upstream these result caps are further narrowed down, translated between different formats and potentially reordered by preference. Once the results are received in the most upstream element, it tries to choose on specific format from those supported by downstream and itself, while keeping preferences of itself and downstream under consideration. Then as a last step it sends a CAPS event downstream to notify about the actually chosen caps. This CAPS event is then further propagated downstream by elements (if necessary translated) that don’t need further information to decide on their output format, or stored and later when all necessary information is available (e.g. after the header buffers are received and parsed) a new CAPS event is sent downstream with the new caps.

This is all very similar to how the same worked in 0.10.

allocation negotiation

The next step, the allocation negotiation, is something new in 1.0. The ALLOCATION query is used by every most upstream element that needs to allocate buffers (i.e. sources, converters, decoders, encoders, etc) to get information about the allocation possibilities from downstream.

The ALLOCATION query is created with the caps that the upstream element wants first answered by the most downstream element that by itself would allocate new buffers again for its own downstream (i.e. converters, decoders, encoders). It’s filled with any buffer pools that can be provided by downstream, allocators (and thus memory types), the minimum and maximum number of buffers, the allocation size, allocation parameters and the supported metas. On its way back upstream these results are filtered and narrowed down by the further upstream elements (e.g. to remove metas that are not supported by an element, or a memory type), and especially the minimum and maximum number of buffers is updated by each element. The querying element will then decide to use one of the buffer pools and/or allocators and which metas can be used and which not. It can also decide to not use any of the buffer pools or allocators and use its own that is compatible with the results of the query.

The usage of downstream buffer pools or allocators is the 1.0 replacement of gst_pad_alloc_buffer() in 0.10, and its much more efficient as the complete way downstream does not have to be walked for every buffer allocation but only once.

Note: the caps in the ALLOCATION query are not required to be the same as in the CAPS event. For example if the video crop metadata is used, the caps in the CAPS event will contain the cropped width and height while the caps in the ALLOCATION query will contain the uncropped (larger) width and height.

Context sharing

Another problem that often needs to be solved for hardware integration (but also other use cases), is the sharing of some kind of context between elements. Maybe even before these elements are linked together. For example you might need to share a VADisplay or EGLDisplay between your decoder and your sink, or an OpenGL context (plus thread dispatching functionality), or unrelated to hardware integration you might want to share an HTTP session (e.g. cookies) with multiple elements.

In GStreamer 1.2 support for doing this was added to GStreamer core via GstContext.

GstContext allows sharing such a context in a generic way via a name to identify the type of context and a GstStructure to store various information about it. It is distributed in the pipeline the following way: Whenever an element needs a specific context it first checks if one of the correct type was set on it before (for which the GstElement::set_context vfunc would’ve been called). Otherwise it will query downstream and then upstream with the CONTEXT query to retrieve any locally known contexts of the correct type (e.g. a decoder could get a context from a sink). If this doesn’t give any result either, the element is posting a NEED_CONTEXT message on the bus. This is then sent upwards in the bin hierarchy and if one of the containing bins knows a context of the correct type it will synchronously set it on the element. Otherwise the application will get the NEED_CONTEXT message and has the possibility to set a context from a sync bus handler on the element. If the element still has no context set, it can create one itself and advertise it in the bin hierarchy and to the application with the HAVE_CONTEXT message. When receiving this message, bins will cache the contexts and will use them the next time a NEED_CONTEXT message is seen. The same goes for contexts that were set on a bin, those will also be cached by the bin for future use.

All in all this does not require any interaction with the application, unless the application wants to configure which contexts are set on which elements.

To make the distributing of contexts more reliable, decodebin has a new “autoplug-query” signal, which can be used to forward CONTEXT (and ALLOCATION) queries of not-yet-exposed elements during autoplugging to downstream elements (e.g. sinks). This is also used by playbin to make sure elements can reach the sinks during autoplugging.

Open issues

Does this mean that GStreamer is complete now? No 🙂 There are still some open issues around this, but these should be fixable in 1.x without any problems. All the building blocks are there. Overall hardware integration is already working much much better than in 0.10.

Reconfiguration

One remaining issue is the handling of device reconfiguration. Often it is required to release all memory allocated from a device before being able to set a new configuration on the device and allocate new memory. The problem with this is that GStreamer elements currently have no control about downstream or upstream elements that use their memory, there is no API to request them to release memory. And even if there was it would be a bit tricky to integrate into 3rd party libraries like libav, as they keep references to memory internally for things like reference frames.

Related to this is that currently buffer pools only keep track of buffers, but there’s nothing that makes it easy to keep track of memories. Implementing this over and over again in GstAllocator subclasses is error-prone and inconvenient.

All discussions about this and some initial ideas are currently handling in Bugzilla #707534

Device probing

The other remaining issue is missing API for device probing, e.g. to get all available camera devices together with the GStreamer elements that can be used to access them. In 0.10 there was a very suboptimal solution for this, and it was removed for 1.0 without any replacement so far.

There is a patch with a new implementation of this in Bugzilla #678402. The new idea now is to implement a custom GstPluginFeature, that allows creating devices probing instances for elements from the registry (similar to GstElementFactory). This is planned to be integrated really soon now and should be in 1.4.

Summary

In practise this currently means that it’s possible to use gst-vaapi transparently in playbin, without the application having to know about anything VAAPI related. playbin will automatically select correct decoders and sinks and have them use in an optimal way together. This even works inside a WebKit HTML5 website together with fancy CSS3 effects.

gst-omx and the v4l2 plugin can now provide zerocopy operation, a slow ARM based device like the Raspberry Pi can decode HD video in realtime to EGLImages and display them, gst-plugins-gl has a solution to all OpenGL related threading problems, and many many more.

I think with what we have in GStreamer 1.2 we should be able to integrate all kinds of hardware in an optimal way and have it used transparently in pipelines. Some APIs might be missing, but there should be nothing that can’t be added rather easily now.

New job (and company)

I have a new job! After working for 5+ years for Collabora, I decided that it’s time for a change and have quit my contract there early September. These have been great years, working with so many brilliant people on Free Software, but time to move on!

So for the future, it’s actually more than just a new job. Together with fellow GStreamer hackers Tim-Philipp Müller and Jan Schmidt we founded a new company: Centricular Ltd.. We are going to provide consultancy services around Free Software with a focus on GStreamer, Multimedia and Graphics initially (but not exclusively). Technically, not much will change in the kind of work I’m doing compared to the past, but this time we will handle everything ourselves so we can better serve the needs of our customers and are more flexible. Hopefully this also provides an even better work environment within a group of equals and gives us more time to contribute to GStreamer and other Free Software projects.

Check our website for a list of Open Source technologies we cover and services we will offer. Obviously this list is not complete and we will try to broaden it over time, so if you have anything interesting that is not listed there but you would need someone for, just ask.

We will also be at the GStreamer Conference in Edinburgh next week.

Great times ahead! 🙂

Streaming GStreamer pipelines via HTTP

In the past many people joined the GStreamer IRC channel on FreeNode and were asking how to stream a GStreamer pipeline to multiple clients via HTTP. Just explaining how to do it and that it’s actually quite easy might not be that convincing, so here’s a small tool that does exactly that. I called it http-launch and you can get it from GitHub here.

Given a GStreamer pipeline in GstParse syntax (same as e.g. gst-launch), it will start an HTTP server on port 8080, will start the pipeline once the first client connects and then serves from a single pipeline all following clients with the data that it produces.

For example you could call it like this to stream a WebM stream:

Note that this is just a simple example of what you can do with GStreamer and not meant for production use. Something like gst-streaming-server would be better suited for that, especially once it gets support for HLS/DASH or similar protocols.

Now let’s walk through the most important parts of the code.

The HTTP server

First some short words about the HTTP server part. Instead of just using libsoup, I implemented a trivial HTTP server with GIO. Probably not 100% standards compliant or bug-free, but good enough for demonstration purposes :). Also this should be a good example of how the different network classes of GIO go together.

The HTTP server is based on a GSocketService, which listens on a specific port for new connections via a GLib main context, and notifies via a signal whenever there is a new connection. These new connections are provided as a GSocketConnection. These are line 424 and following, and line 240 and following.

In lines 240 and following we start polling the GIOStream of the connection, to be notified whenever new data can be read from the stream. Based on this non-blocking reading from the connection is implemented in line 188 and following. Something like this pattern for non-blocking reading/writing to a socket is also implemented in GStreamer’s GstRTSPConnection.

Here we trivially read data until a complete HTTP message is received (i.e. “\r\n\r\n” is detected in what we read), which is then parsed with the GLib string functions. Only GET and HEAD requests are handled in very simple ways. The GET request will then lead us to the code that connects this HTTP server with GStreamer.

Really, consider using libsoup if you want to implement an HTTP server or client!

The GStreamer pipeline

Now to the GStreamer part of this small application. The actual pipeline is, as explained above, passed via the commandline. This is then parsed and properly set up in line 362 and following. For this GstParse is used, which parses a pipeline string into a real GstBin.

As the pipeline string passed to http-launch must not contain a sink element but end in an element with the name “stream”, we’ll have to get this element now and add our own sink to the bin. We do this by getting the “stream” element via gst_bin_get_by_name(), setting up a GstGhostPad that proxies the source pad of it as a source pad of the bin, and then putting the bin created from the pipeline string and a sink element into a GstPipeline, where both (the bin and the sink) are then connected.

The sink we are using here is multisocketsink, which sends all data received to a set of aplication-provided GSockets. In line 390 and following we set up some properties on the sink that makes sure that newly connected clients start from a keyframe and that the buffering for all clients inside multisocketsink is handled in a sensible way. Instead of letting new clients wait for the next keyframe we could also explicitly request the pipeline to generate a new keyframe each time a client connects.

Now the last part missing is that whenever we successfully received a GET request from a client, we will stop handling any reads/writes from the socket ourselves and pass it to multisocketsink. This is done in line 146. From this point onwards the socket for this client is only handled by multisocketsink. Additionally we start the pipeline here for the first client that has successfully connected.

I hope this showed a bit how one of the lesser known GStreamer elements can be used to stream media to multiple clients, and that GStreamer provides building blocks for almost everything already 😉

GStreamer 1.2.0 released and other updates

This blog post might be a bit late but nonetheless, here are some updates that happened in GStreamer in the last week 🙂

GStreamer 1.2.0 released

On Tuesday we released GStreamer 1.2.0. This was just in time for GNOME 3.10, which depends on it and was released on Wednesday (Congrats btw! Many great improvements).

The changes between GStreamer 1.0 and 1.2 can be seen from the release announcement mail. In summary there are many, many nice new features, API improvements and even more bugfixes that we did not include in the 1.0.x bugfix releases because they required larger changes.

Following the 1.2.0 release we plan to do regular 1.2.x bugfix releases with important bugfixes, while at the same time develop new features and in general more complicated changes in the already opened 1.3 development release series (which is the GIT master branch now). The 1.3 release series will later lead to the next stable 1.4 release series. Needless to say that 1.4 will be completely backwards-compatible with 1.2 and 1.0.

GStreamer 1.2.0 binaries

Two days later on Thursday we then released the binaries for GStreamer 1.2.0 for Android (ARM), Mac OS X and Windows. This will allow to take advantage of all the new changes on these platforms too and provide a better cross-platform multimedia experience.

We also finally moved cerbero, the build system used to generate the binaries for all these platforms, to GStreamer’s freedesktop.org GIT and got rid of all the ancient 0.10 cruft that was still in there. This GIT repository is now the official place from where you can get the latest version of cerbero for building GStreamer, and will also be updated for all future 1.2.x bugfix releases. Hopefully it will also be kept up-to-date for GIT master, i.e. the 1.3 development release series.

iOS binaries and build support

Over the last days I also worked on porting the remaining parts of GStreamer 1.0 to iOS (5.1 and newer versions), and also adapted the cerbero build system to automatically build GStreamer 1.0 for iOS and create an installer package from it. iOS binaries were already available in the past from a third-party website (gstreamer.com) but only for the old and deprecated 0.10 version. My changes are based on this.

It’s now also possible to build GStreamer and all its dependencies via cerbero with the Xcode 5.0 toolchain for OS X and iOS. Thanks to more strict compiler checks and behaviour changes, some changes in the build systems and the code were necessary.

Official binaries for iOS will be provided by the GStreamer project in the next days.

Others

gstreamer-vaapi was already ported to the new 1.2 API for better hardware integration and the next release should contain it. Finally it can do all that without using unstable GStreamer API.

Also gnonlin, gst-editing-services and gst-python finally saw a 1.x based (pre-) release. The final 1.2.0 releases should be there in the next days, and hopefully a new PiTiVi release will follow soonish.

GStreamer conference 2013

In October the GStreamer Conference 2013 will happen in Edinburgh. There will be many interesting talks by the GStreamer developers but also other people and companies, who will give talks about what they achieved with GStreamer.

I’ll give a talk about how to integrate hardware capabilities into GStreamer, and which new APIs were added to make this much easier or possible at all (depending on what kind of integration is needed).

New Blog

Soo… after about 10 years I’m having a blog again. I hope I can keep it updated a bit more often than in the past 🙂

Here I’m planning to write about various topics that seem worthwhile to write about, including Free Software, coding, work related things, food, life in general or whatever else comes to my mind.