Parallelizing PNG, part 5: choosing Rust for mtpng

In my last post I wrapped up the patches to improve perceived performance of screenshots on the Linux GNOME desktop. With that done, why not implement my crazy plan for parallel PNG encoding to speed the actual save time?

Before starting any coding project, you need to choose a language to implement it in. This is driven by a number of considerations: familiarity, performance, breadth and depth of the standard library, and whether it’ll fit in with whatever eventual deployment target you have in mind.

Now C here

I’ve done a fair amount of coding in C, it’s well known for performing well, and it’s the default for most of the components in GNOME, so that’s a plus… but its standard library is small, its type and macro systems are limited so repetitious code is common, and memory management is notoriously unreliable — especially in multithreaded situations. I would either have to implement some threading details myself, or find a library to use… which might introduce a dependency to the GNOME platform.

There’s also C++, C’s weird cousin. Templates are way more powerful than C macros, but I have trouble wrapping my head around the details, and error messages expose a lot of details. Memory management and, in latest versions threading, are a big improvement! But it adds the C++ standard library as a dependency, which might or might not fly for GNOME.

My usual go-to languages are PHP and JavaScript, neither of which are suitable for a high-performance binary-parsing system component. :)

Getting Rust-y

But what about Rust? It’s being used by some other GNOME components such as librsvg which renders SVG icons (and is also used by us at Wikimedia for rendering thumbnails of SVG icons, diagrams, and maps!), and can be integrated into a C-compatible library without introducing runtime or install-time dependencies.

Rust’s type system is powerful, with a generics system that’s in some ways more limited than C++ templates but much easier to grok. And it also has a macro system that can rewrite bits of code for you in a way that, again, I find a lot easier to comprehend than C++ templates and is much more powerful than C macros.

And the memory management is much cleaner, with a lot of compile-time checking through a “borrow checker” system that can identify a HUGE number of memory-misuse problems long before you get into the debugger.

Plus, Rust’s community seems friendly, and they’ve got a good package manager system that makes it easy to pull in compile-time dependencies to fill in gaps in the standard library.

I’ve been keeping an eye on Rust for a couple years, reading a lot of docs and some books, and half-starting a few tiny test projects, but never had a project in front of me that seemed as good a time to start as this!

Well that’s just crate

Packages in Rust are called “crates”, and the package manager is named “cargo”. A crate can contain a library, a binary, or both, and can have tests, docs, examples, and even benchmarks that can be run by the package manager.

Creating a stub library crate via ‘cargo init –lib’ is fairly straightforward, and I was able to immediately have something I could compile and have tests run on with ‘cargo test’!

As an editor I’m using Visual Studio Code, which has an “rls” package which provides Rust support including inline display of compiler warnings and errors. This is a *huge* improvement versus manually switching to a terminal to run compiles to check for errors, especially as cascading errors can scroll the initial cause of your error off the screen. ;) But the error messages themselves are usually clear and helpful, often suggesting a fix for common problems.

I’m also able to run my tests on Linux, macOS, and Windows machines without any additional work, as Rust and cargo are pretty portable.

From there, I was able to start adding tiny parts of my plan, bit by bit, with confidence.

Atypical types

Rust’s type system has a number of quirks and interesting points, but the most important kinds of things are structs and enums.

Structs are like structs or classes in C++, kinda sorta. Their fields can be exposed publicly or private to the module, and they can have methods (more or less). Here’s one from the project so far:

#[derive(Copy, Clone)]
pub struct Options {
    pub chunk_size: usize,
    pub compression_level: CompressionLevel,
    pub strategy: Strategy,
    pub streaming: bool,
    pub filter_mode: FilterMode,

The “derive(Copy, Clone)” bit tells the compiler to treat the struct as something that can be trivially copied, like a primitive value. Not always what you want for a struct; I may remove this in my final API. The fields are also exposed directly with “pub”, which I may change to a builder pattern and accessors. But it’s an easy way to get started!

An associated “impl” block adds your associated functions and methods. The compiler checks that you initialized all fields; it’s impossible to end up with uninitialized memory by accident.

impl Options {
    // Use default options
    pub fn new() -> Options {
        Options {
            chunk_size: 128 * 1024,
            compression_level: CompressionLevel::Default,
            strategy: Strategy::Default,
            filter_mode: FilterMode::Adaptive,
            streaming: true,

Meanwhile, Rust enums can be made to work like a C enum, which is mostly a list of constant values, but they can also be a very powerful construct!

Here’s a boring enum example which maps names for the PNG byte filters to their codes:

#[derive(Copy, Clone)]
pub enum FilterType {
    None = 0,
    Sub = 1,
    Up = 2,
    Average = 3,
    Paeth = 4,

And here’s one that’s more interesting:

#[derive(Copy, Clone)]
pub enum FilterMode {

This means a FilterMode can be either the value “Adaptive” or the value “Fixed”… with an associated sub-value of type FilterType. These are kind of like C unions, but they’re not awful. ;)

That fancy enum pattern is used everywhere in Rust, with the Option<T> enum (either None or Some(T)) and Result<T, E> (either Ok(T) or Err(E)). In combination with the “match” construct this makes for great parsing of command line options into optional settings…

options.filter_mode = match matches.value_of("filter") {
    None             => FilterMode::Adaptive,
    Some("adaptive") => FilterMode::Adaptive,
    Some("none")     => FilterMode::Fixed(FilterType::None),
    Some("up")       => FilterMode::Fixed(FilterType::Up),
    Some("sub")      => FilterMode::Fixed(FilterType::Sub),
    Some("average")  => FilterMode::Fixed(FilterType::Average),
    Some("paeth")    => FilterMode::Fixed(FilterType::Paeth),
    _                => return Err(err("Unsupported filter type")),

There’s a little redundancy of namespaces that I can clean up with better naming there, but you get the idea. Note you can even match on an enum containing a string!

Threading with Rayon

Another big concern of mine was being able to do threading management easily. Rust’s standard library includes tools to launch threads and safely send ownership of data between them, but it’s a little bare bones for running a threadpool with a queue of work items.

The Rayon library was highly recommended; it includes both a really solid ThreadPool class that you can drop function closures into, and higher-level constructions building on top of iterators to, for instance, split a large arra-based process into small chunks.

I ended up not using the fancy iterator systems (yet?) but the ThreadPool is perfect. Combined that with message passing from the standard library to send completed blocks back to the main thread for processing, and I was able to get a working multithreaded app going in about two days, with my bugs almost entirely about my filtering, compression, and output file structure, and not the threading. :)

Deflation for once

The last main library component I needed was the actual data compression — I can implement image filtering and chunk output myself, and it was easy to grab a crc32 library via a crate. But the ‘deflate’ compression algo looks hard, and why reimplement that if I don’t have to?

There are several ‘deflate’ and ‘gzip’ implementations written in Rust available as crates, but none of them included all the abilities I needed for parallel chunked compression:

  • overriding the sliding-window dictionary with the previous chunk’s last 32 KiB of uncompressed data to improve compression across chunks
  • flushing output at the end of each chunk without marking the last block as end-of-stream

Possibly I’ll find some time later to figure out how to add those features. But for now, I used the C zlib library (the same one used by the C libpng library!) through an API adapter crate that exposes the absolute minimum C API of zlib as Rust functions and types.

It shouldn’t introduce dependencies since zlib is already used by …. lots of stuff. :D

So how’s it working out?

Pretty well. Check out my mtpng project over on github if you like!

I got it mostly working for truecolor images in about 3 days, and did some performance tuning later in the week that sped it up enough it’s often beating libpng on a single thread, and always beats it using multiple threads on large files.

The API needs work before I publish it as a crate, so I’ll keep fiddling with it, and I haven’t yet figured out how to implement the C API but I know it can be done. I’ve also got a long list of todos… :)

I’ll add more blog posts soon on some interesting details…