Memory leaks and DB_DataObject

As part of my work restructuring the background daemons, I’m starting to do memory profiling on StatusNet… I’d like to get it to the point where we can enable a debug mode that just snapshots out how much memory each class and global takes up while you’re running.

I’ve already found some low-hanging fruit in DB_DataObject. This is a “clever” library that likes to stash a lot of stuff into a global $_DB_DATAOBJECT array, including all your query result data.

Sure you’ve got a free() method, but nobody calls those consistently…

I found I could quite easily drop a destructor onto our Memcached_DataObject intermediate class which calls the free() method when the object is destroyed (explicitly with an unset(), or implicitly by falling out of scope and no longer being referenced).

My queue-handler daemon was able to get through an average of 595 notices on my stress test before croaking on a 32M memory_limit, but jumped up to get through 761 notices simply by adding this one-line destructor. (Average of 5 runs.)

Not bad for a start!

I’ve committed the destructor onto 0.9.x branch.

L10n for StatusNet plugins

Now that we’ve got UI translations going pretty well for StatusNet core via TranslateWiki, I’d like to see if we can get things working for plugins as well before the 0.9 final release.

I’d like some quick feedback from folks before merging into 0.9.x; you can see my work branch here.

In core code, to make a string translatable we wrap it in one of the gettext functions like this, most often the _() shortcut:

 $this->text(_("You have a new message."));

If you’re a plugin though, you need to add your own translations into the mix, and gettext makes you tell it which “domain” you’re pulling from each time…

 // at init time
 bindtextdomain("AwesomePlugin",
                dirname(__FILE__) . "/locale");
 ...
 $this->text(dgettext("AwesomePlugin",
                         "You have a new message."));

Repeating your plugin name for every string gets old REAL fast!

In my work branch I’ve added a new gettext wrapper function _m() which knows if it’s been called from a plugin, and picks out the right domain for you based on the plugin directory.

So rather than playing around with bindtextdomain() and dgettext() manually, you can just add one character and not worry about the rest:

 $this->text(_m("You have a new message."));

It also knows how many parameters you passed to it, so instead of whipping out ngettext() or dngettext() (or god forbid dnpgettext!) you can just keep using the nice compact _m() when your needs get fancier:

Plurals:

 $this->text(sprintf(_m("You have a new message.",
                        "You have %d new messages.",
                        $messageCount),
                     $messageCount);

Contexts for disambiguation:

 // read vs delete
 $this->text(_m("message-action", "Read"));
 
 // read vs unread
 $this->text(_m("message-state", "Read"));

Plurals with contexts!

 $this->text(_m("message-state",
                "Read",
                "Read",
                $messageCount));

Plugins maintained in the main StatusNet repo shouldn’t need to worry about anything else — the .po file templates and updates via TranslateWiki will be handled through the same workflow as the core. (I’m working with Siebrand at TranslateWiki to make sure we can automate this as much as possible, including adding new plugins.)

scripts/update_po_translations.php can regenerate all (or just core, or just plugin) .po templates for those who want to push them in immediately, though.

As with core, the binary .mo files won’t be included in the git repository to simplify code maintenance. I’ve added a Makefile at the base level which’ll build all the .mo files for folks to test localization in their working copies (or to build a release!)

Confusing Twitter settings

I’ve been getting feedback for some time that this checkbox in StatusNet’s Twitter connect settings is confusing:

Subscribe to my Twitter friends here.

People tend to think this should mean that this’ll pull all your Twitter friends’ updates into your timeline on the SN site… which is actually a separate checkbox, when enabled (sorry, it’s still disabled on identi.ca):

Import my Friends Timeline.

I suspect we could come up with clearer wording for both of these… let’s start the bidding!

  [x] Subscribe to my Twitter friends’ accounts on %%site.name%% automatically.

  [x] Show my friends’ tweets here on %%site.name%%.

Any other ideas?

StatusNet on TranslateWiki

Ok, we’ve got localization for StatusNet set up on TranslateWiki — setup was pretty straightforward and they’re excited about supporting us. Big thanks to Siebrand Mazeland for getting a test setup in yesterday and the live setup today! [And of course thanks to GerardM for pimping out TranslateWiki every chance he gets — lots of cool stuff going on there now!]

Portal page for our project:
http://translatewiki.net/wiki/Translating:StatusNet

There’s still some rough edges in the display for gettext-sourced translations, and I want to make sure we’ve got the update process going cleanly, but it seems to be working pretty well. We’ve already had some fixes to the German translation, and I’ve confirmed that I can pull and commit updates from TranslateWiki into git — yay!

To get set up, create yourself a login and hit up the requests page to get your edit permissions confirmed.

If you need to grab TranslateWiki folks live to poke something, right now the best place is the #mediawiki-i18n channel on irc.freenode.net; Siebrand and Nikerabbit are our main contacts there. There’s also a web gateway to the channel.

I’m also collecting some ideas for improvements to the interface and process on my user page.

— brion vibber (brion @ status.net)

StatusNet daemon rearchitecture

Some of StatusNet’s awesomer features like the XMPP and Twitter bridges require running background daemons to watch event queues or keep connections to XMPP servers open.

Alas, this just isn’t going to scale to the future StatusNet 1.0 world where we’re going to be running thousands of instances for our hosted services. We see a lot of problems with memory leaks and instability in those daemons with even just a few live sites now…

I’ve worked out what I think is a feasible architecture for a more scalable queue/daemon system for big sites to use; details and some diagrams I whipped up to help me keep my head straight are up on the wiki:

old-smallnew-small

I’m planning to use a single lightweight master daemon which will maintain long-running connections to the ActiveMQ queue and XMPP daemons. Actual event handling will still be done at the PHP/StatusNet level, but now as short-running processes which will handle a single event and exit.

This reduces the surface area for memory leaks and other oddities we encounter in long-running PHP scripts, and most importantly means we only have to run as many processes are there actually are events being handled!

Please feel free to give some feedback before I jump into implementation. :)

— brion vibber (brion @ status.net)
Senior Software Architect, StatusNet
San Francisco

PubSubHubbub and microblogging

pushI’m poking about at the Realtime Web Summit, just got out of the PubHubSubbub session.

PuSH is a relatively lightweight protocol for pushing feed updates in more or less real-time using standard web protocols (eg, HTTP!). As currently spec’d, PuSH covers the server-to-server replication space pretty well — publishers send their updates to hubs, which send them on to the callback URLs given by subscribers.

For StatusNet, we’re really interested in two possible extensions to this, which would be outside the scope of the current PuSH spec:

Microblogging metadata extensions. PuSH deals with RSS and Atom feeds, but doesn’t really care what’s inside them. Microblogging and other social-type services will have various metadata — profile name & avatar, friend relations, ratings, comment id links, etc — which could be embedded into activity stream feeds, allowing different services to handle remote subscriptions interoperably.

Of course, we could just push everybody to support OMB… but the PuSH model may be more flexible, allowing subscriptions to blogs etc to be aggregated into your notice stream.

“Last mile” push to clients. There’s still no standardization for real-time communications from an aggregator or social service to end-user web, desktop, or mobile clients. PuSH as spec’d can’t handle this since it needs a URL to post updates to subscribers, which a NAT’d or mobile client obviously isn’t going to have.

A more or less standard way to attach XMPP or long-polling to pull updates from an aggregating hub to client end-points would be very nice; just like common use of the Twitter API has allowed interop between client apps and services (many Twitter clients will happily speak to identi.ca if you just change the API url to https://identi.ca/api!). That third-party ecosystem is mostly restricted to polling, though, and could really benefit from interoperable methods for pushing updates to an open client.