Rich-content embedding for the web: time for oEmbed 2?

One of the technologies I encountered working at StatusNet that I’m very interested in bringing to MediaWiki is oEmbed, a protocol for fetching enough information about a photo or video that another web site can, reasonably sanely, drop a thumbnail view or inline video player into its own pages.

oEmbed and similar discovery systems mostly appear behind the scenes, but pop up in WordPress, Twitter, Facebook and the like to make a lot of embedding fairly transparent when you start linking to a URL with a cool photo or video on it.

Traditionally, embedding a photo or video from another site to your own requires manually copy-pasting a chunk of HTML, which is presented to you in some side option on the hosting site. The advanced video player in use on Wikimedia Commons presents such an option:

The actual code might be an <iframe>, <script>, <embed>, <object> etc depending on the player technology, but at essence it’s a blob of stuff that’s fairly opaque to most people.

There are two huge problems with this:

  1. Taking raw HTML from another site is potentially unsafe: because of this, most modern blogs, forums, wikis, and social networks don’t actually allow you to paste raw HTML in and expect it to work.
  2. Finding the embeddable HTML is nonstandard and unwieldy: you’ll find it in different places on Flickr, Youtube, and Wikimedia Commons, and there’s no consistent way for machines to find it for you.

oEmbed essentially provides a standard system for discovery of the embedding information. Given the URL to some interesting resource (a Flickr photo page, a Youtube video page) a consuming site can ask for the oEmbed API endpoint and send it a request for embedding of that resource, optionally fitting a given maximum size.

StatusNet uses this currently to fetch small thumbnails for linked photos and videos but it’s possible to go farther and embed video players etc, if you trust the provider site or have the infrastructure to run the HTML through an offsite <iframe> (giving it an isolated security context in the browser).

I’d very much like to set up a standard oEmbed provider for MediaWiki which will make it easier to expose photos, videos, and other goodies on Wikipedia, Commons, and other Wikimedia sites — especially if/when we start doing more multimedia features  like interactive maps, diagrams, physics simulations, etc which can be exposed in the same way. Adding copyright & source metadata to the embedding information would be a huge help for us, especially if we can consume info from other sites too, indicating license compatibility etc.

Currently there’s a thread on the oEmbed mailing list discussing possible improvements for a next version of the protocol, of which metadata, safer (iframe-based) arbitrary HTML embedding, and the possibility of player control APIs are so far at the top of the list. If interested in the low-level fun of embedding implementation, do feel free to stick your nose in and comment!


The system works!

Back in the olden days when Wikipedia was young and new, I used to get sucked into all sorts of bug hunts and fix deployments on the servers. While it can be a great fun time to do live debugging on a huge system like that, it’s also a major time sink!

Since coming back to Wikimedia a few weeks ago, I’ve tried to keep my focus narrower: my most direct work is preparation & experimental groundwork for the big parser & visual editor projects, so for other things I’m trying to be available for advice and review without actually diving fully into every cool thing that comes up.

Today we had a great example of how our “post-startup” Wikimedia Engineering department can work well in getting a quick live fix pushed:

  • Philippe was following up on a deletion issue reported from the community: a stray deleted category page was stuck in Google listings, causing some annoyance. He could direct Google to re-scan the page, but it kept claiming the page still existed. Unable to figure out why, Philippe asked me to take a look….
  • I confirmed that MediaWiki was reporting the page as existing (HTTP 200 OK rather than HTTP 404 Not Found) and, remembering vaguely how the 404 reporting worked, pulled up the source of CategoryPage.php to see exactly what it was checking for.
  • By hitting the MediaWiki API I could confirm that the current source code *should* return a 404 based on the category’s empty page counts in the database…
  • …but comparing the current development code against the MediaWiki 1.17 deployment branch, I could see that this was a new fix which hadn’t been deployed yet!
  • Since it was a clean simple fix, I marked it as ok on our CodeReview system and tagged it for merging.
  • I was then able to hit up some of the ops & dev folks in #wikimedia-dev to see it we could get the fix out quickly or if it would have to wait for later deployment in a batch with other stuff.
  • It being a nice clean isolated fix, Sam concurred it was safe to merge, and Arthur coordinated pushing it out. Within a few minutes, we confirmed the fix on and then got it deployed to all wikis.
  • A quick purge on the affected page to ensure caches are clear, and Philippe was able to run it through Google again and get it cleared out immediately!

This is kind of an ideal case: the fix was already in development trunk, it’s a small clean patch that’s easy to review and merge, and testing the return code on a deleted category is very easy to do. But it does confirm that we’ve got a basically sane setup where issues can be pushed along from confirmation to merging to testing to deployment in a reasonably speedy fashion, without having to bottleneck it all through a specific “does-everything-themselves” person.

Makes me happy!