Putting the media in Wikimedia!

I’m here in the city of lights for Wikimedia’s big Multimedia Usability meeting. We’ve got a fair chunk of our MediaWiki devs and folks with more of the media & communications organization end in one place to hash out some of the key issues and see what we can really accomplish in the short and medium term, including long-needed reworking of the upload interface and the workflow of manually tidying up metadata for newly uploaded files — sometimes coming in batches of many thousands!

Since my free time’s pretty low these next couple months I’m trying to keep my own commitments to where I can pack the most punch…

Things to do proof of concept coding to confirm our implementation theory:

  • Metadata!
    • Proof of concept for template/field/subtemplate extraction and mapping to RDF
    • Try to organize w/ Robert about how to store and search the attached RDF fields in Lucene
    • [Note I’ve been pulling existing exif data to get some stats about what can be pre-extracted; only about 7% of Commons files with EXIF data have a ‘Copyright’ field.]

Things to ponder specs on:

Other things to peek at and give some directional advice on:

  • Check out what it’d take to integrate Geohack tools better (via Magnus)
  • Take a peek at Unicode encoding & keyboard input problems for some languages requiring funky script support such as Malayalam (via Gerard)

Update: Also want to poke XMPP RC test setup per Duesentrieb. :D

Screen integration with terminals?

As a guy who spends a lot of time in remote Linux shells from a laptop, I’m looking for better integration between my terminal emulators and screen sessions.

  • Let me use native scrollbars to access the backscroll!
  • Start me in screen by default so I don’t forget to start one.
  • If I have disconnected sessions, let me choose to reconnect or create a new session, with some reasonable menu.
  • Automatically reconnect to the server and the screen session after network disruption (switching networks, sleeping the machine overnight, etc)
  • Not messing up backspace. (This plagues me on Mac clients a lot. Backspace works fine in regular terminal but becomes forward delete in screen session. WTF?)

Linux & Mac clients both welcome… Anybody know something down this road already available?

Dell Mini love

We finally replaced my fiancée’s ancient PC with a shiny new Dell laptop. While ordering, I couldn’t help myself and tossed in a Inspiron Mini 9 for myself:

This little cutie weighs in at just 2.26 pounds, less than half of my MacBook’s hefty 5 pounds. I’ve found that the Mini is much more back-friendly than my MacBook; I can painlessly lug it to the office with my laptop bag slung over my shoulder (easier for getting on and off the subway) instead of nerding it up in backpack mode.

The top-end model I picked packs 16GB storage and 1GB RAM running on a 1.6 GHz Atom processor — far more powerful than the computer I took with me to college in 1997. Admittedly, my iPhone also beats that computer at 8GB/128MB/300MHz vs 6.4GB/64MB/266MHz. :P

The compact form factor does have some impact on usability, though. The 1024×600 screen sometimes feels too tight for vertical space, but they include a handy full-screen zoom hotkey for the window manager which opens things up.

The keyboard feels a bit cramped, and some of the keys are in surprising places (the apostrophe and hyphen are frequent offenders), but it’s still a lot easier to type serious notes or emails on than the iPhone. I had to disable the trackpad’s click and scrolling options to keep from accidentally pasting random text with my palms while typing…

The machine shipped with a customized Ubuntu distribution which is fully functional; they include a “friendly” launcher app which can be easily disabled, and even the launcher doesn’t interfere too badly. The desktop launch bar that’s crept into Gnome nicely handles my “I need Spotlight to launch stuff with the keyboard” fix. :) Firefox works fine (after uninstalling lots of Yahoo! extensions), Thunderbird installed easily enough, and I even got Skype to work with my USB headset! (AT&T’s international roaming charges can bite me…)

The biggest obstacle for me to use this machine every day is my Yojimbo addiction. I use Yojimbo for darn near everything — random notes, travel plans, budgeting, grocery lists, recipes, encrypted password stores, saving articles and documentation for future references. It’s insanely easy to use, the search works, I don’t have to remember where I saved anything, and it syncs across all my Macs. But… it’s Mac-only. :(

I’m trying out WebJimbo, which provides an AJAX-y web interface for remotely accessing your Yojimbo notes. It’s very impressive for what it does, but I’m hitting some nasty brick walls: editing a note with formatting drops all the formatting, but I use embedded screen shots and coloring extensively in my notes.

I’ve seen some reports of people hacking Mac OS X onto the Dell Mini — very tempting to avoid OS switching overhead. :) But I think if I really want that, eventually I should just suck it up and buy a MacBook Air. The form factor is the same as my MacBook (full keyboard, roomier 1280×800 screen), but at 3 pounds it’s much closer to the Mini than to my regular MacBook in weight, so should be about as back-friendly for the subway commute and air travel.

Of course, the Air costs $1799 and I got my tricked-out Mini for about $400, so… I’ll save my pennies and see. ;)

Dupe uploads check

We had a post come in this morning on mediawiki-l about an extension for adding a hash-based duplicate file check on upload.

A similar check had been recently added to MediaWiki core, but only on the file description page — it wouldn’t stop your upload while you were in the process. Since all the required backend was there I’ve gone ahead and added it in as a built-in feature for MediaWiki 1.13:

(Note that since we can’t get the file content hash until you upload, there’s still no way to give the warning before you actually upload it. But at least now it lets you cancel at that point instead of having to ask a sysop to come delete your file!)

German FlaggedRevs tested for 10 minutes

Ok, so we finally got the FlaggedRevs for German Wikipedia config set up… then turned it off after a few minutes.

We did, alas, encounter a few problems, which didn’t come up as much in earlier testing, but came up *hard* in a few minutes around 3am at Wikipedia. :)

Floating UI boxes and floating infoboxes don’t mix well.

The nice small versioning marker is really nice, but that’s way too disruptive, and we’ll need to get it worked out one way or another.

Second, some of the reporting pages weren’t working, in part due to some last-minute tweaks to the DB layout to make it easier to deploy. (This should be fixed now.)

Third, the “redirected from” subtitles are being broken, which’ll disrupt some general editing functionality in an unpleasant way. An example on de.labs test wiki.

Once the UI bits are fixed up, we’ll give it another test run… und FlaggedRevs kommt wieder!

Top 10 Wikimedia DB errors

I did a quick look last night through our database error logs for the last week or so, breaking them down by function and error type. Here’s the top ten function-err loci:

Hits Function errno Error
620 Article::updateCategoryCounts 1213 Deadlock found when trying to get lock; Try restarting transaction
240 Article::insertOn 1062 Duplicate entry ‘N-XXX’ for key 2
41 Article::doDeleteArticle 1213 Deadlock found when trying to get lock; Try restarting transaction
26 LinksUpdate::incrTableUpdate 1213 Deadlock found when trying to get lock; Try restarting transaction
19 TitleKey::prefixSearch 1030 Got error 28 from table handler
9 Title::invalidateCache 1213 Deadlock found when trying to get lock; Try restarting transaction
9 2013 Lost connection to MySQL server during query
8 User::saveSettings 1205 Lock wait timeout exceeded; Try restarting transaction
8 TitleKey::prefixSearch 2003 Can’t connect to MySQL server on ‘XXX’
7 Job::pop 1213 Deadlock found when trying to get lock; Try restarting transaction

A large chunk of our DB errors are from conflicting transactions; the number one spot is currently taken up by updates to category counts, which is often part of an expensive page deletion transaction.

We’re often pretty lazy about rerunning database transactions when they’re rolled back, throwing an error and making the end-user resubmit the change. This is kind of lame, but at least the transaction rollback theoretically keeps the database consistent.

The number two spot seems to be for conflicting page creations — possibly due to automatic resubmissions after a slow save operation.

There’s a few “disk full” errors, which were probably due to a transitory error on one DB box.


Is this dialog showing success or failure?


Look closer…


Wha? I’m still not sure what’s going on, and I still don’t seem to have a search index in the KDE Help Center. Sigh.

Update: htdig wasn’t installed, which it didn’t report very well. After installing I can apparently build the index, but search still fails.

Again, the error doesn’t get reported well in the UI — it just echoes the khc_htsearch.pl command line instead of explaining what went wrong:

$ khc_htsearch.pl –docbook –indexdir=/home/brion/.kde/share/apps/khelpcenter/index/ –config=kde_application_manuals –words=multi-file+search –method=and –maxnum=5 –lang=en
Can’t execute htsearch at ‘/srv/www/cgi-bin/htsearch’.


GParted rocks

Did some upgrades on my girlfriend’s Windows PC today… The techs who originally set up her computer gave her an unconscionably small C: drive, a tiny 10 gig slice of an already-modest 40 gig drive. Even with careful discipline trying to put things on the D: partition, 10 gigs doesn’t go very far. Shared DLL installs, gobs of temporary files, cached updaters for all manner of software, etc all fill that stuff up and it was running out of room constantly.

My secret weapon to fix this was to be an Ubuntu Linux live CD, which conveniently comes with GParted.

I took a 200 gig drive left over from my dear departed Linux box and hooked it up, figuring I could back up the old data over the network, overwrite it with a raw disk image from the 40 gig drive, and then resize the NTFS partitions to a livable size.


Well, sort of. :)

It turns out I could have saved myself some trouble at the command line by copying the partitions across drives with GParted itself instead of goin’ at it all old-school with dd. (Neat!)

I had two sticking points, though.

First, it didn’t seem to let me move the extended (D:) partition to a different place on the drive. That meant there was no room to expand the C: partition, which was the point of the exercise.

I ended up having to create a copy of the D: partition, which it let me put in the middle of the drive, and then delete the old partitions. Kind of roundabout, and it changed the partition type from extended to primary, but Windows doesn’t seem to care about that so keep those fingers crossed…

My second snag was due to Ubuntu’s user-friendliness. As soon as the new partition was created, the system mounted it — which caused the NTFS cloning process to abort, warning that it can’t work on a mounted filesystem.


Had to go into the system settings and disable automatic mounting of removable media… luckily that’s easy to find in the menus. If you know it’s going to be there, at least. :)