Don’t spam me, bro

I got a pretty HTML spam email from Microsoft about the opening of their Mission Viejo, CA retail store today. This was probably not a smart marketing decision…

  1. Spam is bad; I don’t remember ever asking for this
  2. I haven’t lived in the area for 3 years. What old product reg info were they pulling from?
  3. All links in the mail run through the email marketing partner’s domain, so I’m never going to click them… haven’t they heard of phishing?

Although I have to admit, I’m tempted to swing by when I’m down that way visiting the folks for Christmas, just to see a Microsoft store… :)

StatusNet daemon rearchitecture

Some of StatusNet’s awesomer features like the XMPP and Twitter bridges require running background daemons to watch event queues or keep connections to XMPP servers open.

Alas, this just isn’t going to scale to the future StatusNet 1.0 world where we’re going to be running thousands of instances for our hosted services. We see a lot of problems with memory leaks and instability in those daemons with even just a few live sites now…

I’ve worked out what I think is a feasible architecture for a more scalable queue/daemon system for big sites to use; details and some diagrams I whipped up to help me keep my head straight are up on the wiki:

old-smallnew-small

I’m planning to use a single lightweight master daemon which will maintain long-running connections to the ActiveMQ queue and XMPP daemons. Actual event handling will still be done at the PHP/StatusNet level, but now as short-running processes which will handle a single event and exit.

This reduces the surface area for memory leaks and other oddities we encounter in long-running PHP scripts, and most importantly means we only have to run as many processes are there actually are events being handled!

Please feel free to give some feedback before I jump into implementation. :)

— brion vibber (brion @ status.net)
Senior Software Architect, StatusNet
San Francisco

PubSubHubbub and microblogging

pushI’m poking about at the Realtime Web Summit, just got out of the PubHubSubbub session.

PuSH is a relatively lightweight protocol for pushing feed updates in more or less real-time using standard web protocols (eg, HTTP!). As currently spec’d, PuSH covers the server-to-server replication space pretty well — publishers send their updates to hubs, which send them on to the callback URLs given by subscribers.

For StatusNet, we’re really interested in two possible extensions to this, which would be outside the scope of the current PuSH spec:

Microblogging metadata extensions. PuSH deals with RSS and Atom feeds, but doesn’t really care what’s inside them. Microblogging and other social-type services will have various metadata — profile name & avatar, friend relations, ratings, comment id links, etc — which could be embedded into activity stream feeds, allowing different services to handle remote subscriptions interoperably.

Of course, we could just push everybody to support OMB… but the PuSH model may be more flexible, allowing subscriptions to blogs etc to be aggregated into your notice stream.

“Last mile” push to clients. There’s still no standardization for real-time communications from an aggregator or social service to end-user web, desktop, or mobile clients. PuSH as spec’d can’t handle this since it needs a URL to post updates to subscribers, which a NAT’d or mobile client obviously isn’t going to have.

A more or less standard way to attach XMPP or long-polling to pull updates from an aggregating hub to client end-points would be very nice; just like common use of the Twitter API has allowed interop between client apps and services (many Twitter clients will happily speak to identi.ca if you just change the API url to https://identi.ca/api!). That third-party ecosystem is mostly restricted to polling, though, and could really benefit from interoperable methods for pushing updates to an open client.

gettext: the agony and the ecstasy

I’ve been poking around on StatusNet’s i18n to see if we can get localizations working with less fidgeting; all languages should “just work” when you drop in your StatusNet install.

We’re loading translations using gettext, which unfortunately can only  load translations for languages which are set up as locales system-wide. This is massively problematic and has lead to localization just plain not working for most people — it’s even broken on identi.ca!

The good news is that we already have a compatibility layer that doesn’t have this limitation: php-gettext, which provides source-compatible drop-in gettext interface in pure PHP.

The bad news is that if the native gettext module *is* present, there doesn’t seem to be any way to override calls to _() — PHP has no facility for monkeypatching to replace existing functions. So, to force use of the compat functions for unsupported locales we’d need to change all calls to a new name.

Any preferences between something short and cryptic like __() or something clearer and StatusNet-y like common_msg()?

This is also a good time to think about handling localization for plugins; currently the plugins that ship with StatusNet’s source just have their localizations lumped into StatusNet’s main file — that obviously won’t do for plugins that are maintained and distributed separately.

In the gettext model it looks like the way to handle this is to use multiple “domains”; StatusNet’s core domain is “statusnet” (duh) and set as the default domain for gettext calls. A plugin can bind its own locale subdirectory to another domain (say “ldapplugin”) and instead of calling _(“Some text”) can call dgettext(“ldapplugin”, “Some text”).

This could perhaps be simplified by adding helper methods onto the Plugin base class…

$this->_("blah")

Thoughts?

Updated 2009-10-16: I seem to be able to work around the locale setup problem by setting a valid locale before setting the invalid one. :P That should hold us for a while before we try larger changes.


Cross-posted w/ StatusNet-Dev mailing list.