Parallelizing PNG compression, part 4: patching GNOME

Continuing last week’s series on parallelizing things to speed up saving screenshots as PNGs on GNOME!

Going back to the original GNOME desktop screenshot issue: there’s a long delay after hitting “PrintScreen” before audiovisual feedback confirms that a screenshot has been taken, especially at very high resolutions and a slower CPU.

The modular GNOME

Screenshot operations in the GNOME desktop environment are split over two modules: gnome-shell (which as the compositor has access to the raw screen data under both X11 and Wayland modes), and gnome-settings-daemon (gsd) which has some kind of facility for global keyboard shortcuts.

Current order of operations is thus:

  1. gsd’s keyboard shortcut handler for screenshots makes a call to the shell over D-Bus
  2. shell captures the screen
  3. shell compresses the image and saves it to disk
  4. shell starts a visual “flash” effect
  5. shell returns a success value to gsd over D-Bus
  6. gsd starts the “click” sound from the audio theme

Mysteriously the sound and the visual effects are started in different processes, but at about the same time so they feel synchronized. Just after a long delay from my keypress!

Fixing it

The proposed fix I outlined in a previous post seems pretty feasible, and I was able to whip up a proof of concept:

But how do you test that?

Building and testing GNOME Shell

GNOME is a pretty big environment with a lot of moving parts, and can take a few hours to build. Building a single package from the latest version control is difficult because a lot of components need to be in sync!

There are two (at least) systems for building the entirety of GNOME for testing: BuildStream is the current recommended system, and is pretty awesome, but you can’t test some components like the shell.

The older system is jhbuild, which has been around for a few years and I have some bad memories of weird broken builds in the distant past. :D

Note that BuildStream is very self-contained and produces an isolated system set, while jhbuild grabs a lot of dependencies from the host system and can be more … fragile.

Build gotchas: WebKitGtk+

When I first tried building with BuildStream, I complained that a very slow portion of the build was WebKitGtk+ being limited to only 2 threads on my 8-core machine.

But when I tried again with jhbuild, I found that no such limit was applied — it was happily running 16 simultaneous compiles (remember hyper-threading!) and …. running …. out of memory. Some of the giant “unified build” C++ files in WebKit were eating up ~1.5 GiB of RAM each, and at 16 threads that goes wwaayy past the 12 GiB available on the system!

I added some swap space, and it ran further but vveerryy sslloowwllyy.

WebKitGtk’s build is with meson (ninja) which defaults to running enough build threads to fill your available CPUs. There’s a command line option (-j) to override it, but I couldn’t figure out offhand how to change the command line that gets invoked by jhbuild’s meta-build system. (There’s not an environment variable to override it, which would have been easier to do I think.)

As a quick hack I temporarily disabled hyper-threading in the BIOS to reduce the number of logical CPU cores, and thus autodetected processes, so only half as much RAM was used. :P

This got the build through WebKit and I was able to re-enable HT and eventually get to patching gnome-shell.

Testing the shell

To properly test the shell and daemons you need to open a new gdm desktop session running the custom-built stuff, but I’m not really sure how to do that yet. :)

So far I’ve tested just the shell component, which you can swap out for the currently running shell like so:

$ jhbuild run gnome-shell --x11 -r

Beware that if it breaks, you have no shell / window manager anymore and it’s…. a bad experience.

Be very careful that you can restart the shell with the mouse or something, as keyboard focus breaks when the shell dies! :D

Note that running the patched gnome-shell with the system’s unpatched gnome-settings-daemon produces a nicely timed “click” + “flash” and then a second “click”. :) If the two modules are run both patched, the second click will be removed.

One more thing

Doing more thorough benchmarking for our next post, I noticed that the actual save operation in the shell (via gdk-pixbuf, via libpng) is also 30% slower than it needs to be because gdk-pixbuf enables an unnecessary transform operation in libpng.

Fix for gdk-pixbuf is submitted, and merged!

Coming up: getting Rust-y with mtpng

I’ve also made great strides on the actual encoding parallelization. More to come!