Tuesday 22 October 2013

Reducing wake-ups, the 2013 edition

While doing work on UPower, trying to reduce its wake-ups when idle, I realised that a couple of applications on my desktop were waking up far more than should be necessary.

Detecting wake-ups

First, we need to detect wake-ups. This is a fairly well-known, and old, process, using the venerable PowerTop.

# powertop --html=/tmp/foo.html --time=60

You can increase the number of seconds to get a more realistic view of your machine's idleness, but this is enough to show the main culprits.

In my case, evolution, gnote and devhelp were all waking up about 30 times a second whilst mostly idle. Evolution might be an outlier, as it also talks to the network, and is a bigger application to debug, so I started with devhelp.

$ strace -vvv -p `pidof devhelp`
Process 19069 attached
restart_syscall(<... resuming interrupted call ...>) = 0
recvfrom(6, 0x23fece4, 4096, 0, 0, 0)   = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=3, events=POLLIN}], 3, 17) = 0 (Timeout)
recvfrom(6, 0x23fece4, 4096, 0, 0, 0)   = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=3, events=POLLIN}], 3, 17) = 0 (Timeout)

And the screen fills up with EAGAIN errors. This looks a lot like a a timeout being called too often.

Debugging wake-ups

I started sprinkling debug in g_main_context_prepare(), the function that prepares the various timeout and idle sources for dispatch, and calculates the timeouts for each poll() operation.

Something like:
if (source_timeout > 0 && source_timeout <= 20)
  g_message ("Source '%s' has very low timeout %d", g_source_get_name (source), source_timeout);

The problem is we end up getting a null source name for almost all of the sources. This is where g_source_set_name() and its sibling g_source_set_name_by_id() come in handy.

timeout_id = g_timeout_add (timeout, myfunction, mydata);
g_source_set_name_by_id (timeout_id, "[module-name] myfunction");

And we start doing that all over GTK+. As you can see from the patches in the bug, there's not just timeouts added by g_timeout_add() that we need to name.

In custody

The huge amount of debug shown when running our application with the gmain.c debug above tells us:
GLib-Message: Source 0x2d4f380 '[gtk+] gdk_frame_clock_paint_idle' has very low timeout 17
even when the window doesn't change, is in the background, and not updating. About 30 times a second.

Who are you gonna call?

Or by whom have you been called, rather. This is a small section of my ~/.gdbinit which will break on a particular function, print a backtrace, and continue. It makes it easier to interact with the logs after the fact, especially if they are calls that happen often and you're not interested in all the calls.

set breakpoint pending on
break gdk_frame_clock_begin_updating
commands
bt
continue
end

We did the same for gdk_frame_clock_begin_updating and found a backtrace similar to the one in Bugzilla. We only needed to start reading some code after that, and figuring out what was going on. The result was a bug in GTK+, likely a regression from GTK+ 3.8.

Your laptop should last a bit longer when the updates hit.

TL;DR

Name your timeouts with g_source_set_name_by_id(), run powertop, and file bugs against broken applications.

Update: Fixed powertop command-line.

Thursday 17 October 2013

More power management changes

As is becoming common, we will have some more power management changes in GNOME 3.12, though those changes will also affect other desktops, whether they use UPower's D-Bus interface, or libupower-glib, the helper library.

The goals of the exercise were simple:

  • reduce wake-ups on the daemon and on the client side
  • reduce code duplication amongst desktop environments, and even within the same environment (composite battery, anyone?)
  • moving some policy actions to a lower level (one could not request hibernation or suspend when multiple users were logged in without interaction and passwords)
All those changes are now in UPower master ready for testing.

Out with the old

The deprecated interfaces for Suspend, Hibernate, etc. are finally removed, after being obsoleted by logind. We've also removed the QoS interface that nobody was using, and the out-dated battery recall support. It's not that batteries don't explode any more, it's that they don't all come from known-bad batches.

In with the new

We have 2 new properties on each of the devices.

WarningLevel which uses daemon-side configurations to tell you whether a device's battery level is low, critically low, or whether we're about to take action on that critical level.

We also have IconName, which replaces some cut'n'pasted code between desktop components. If your desktop environment has many more icons for all types of devices on low battery, for example, you can ignore this property and use the code you always have.

Using those new properties usefully is the new DisplayDevice object. It groups all the batteries and UPSes in the daemon into one, easy to use object that you can use to display a single status icon in your shell chrome. Obviously, if you want to show more devices, the individual batteries and UPSes are still available through the usual means. And it obviously has the 2 new properties mentioned above, so your session daemon can get told when to show notifications for low batteries.

And finally, using that new combined DisplayDevice is the critical battery action policies. As mentioned above, multi-user systems could not hibernate without requiring the user to enter an administrator password, which is less than convenient when your machine is running out of UPS power fast. The configuration for that policy is now in the daemon itself, with sane defaults, and it will hibernate the machine for you.

And to the modernisation

libupower-glib now uses GDBus, even if the daemon doesn't. The daemon however sends PropertiesChanged signals which means that modern D-Bus bindings will automatically get the new values for properties, instead of polling the daemon. The DeviceChanged and Changed signals have thus been removed.

API changes

They are numerous, too many to mention here. I've posted to the device-kit mailing-list with a list of changes that were made, reply there if you have any questions regarding using UPower in your application or session daemons.

Miscellaneous

systemd >= 207 will save your brightness settings across reboots, and the upcoming systemd 209 will have support for saving keyboard backlight across reboots.

I've made attempts at supporting Intel Rapid Start in systemd, but this will actually require kernel changes. Hopefully we should be able to land this by the time GNOME 3.12 is released.