Monday, April 10, 2006

Combining OPML and RSS to create an export format for a blog

Marc Canter links to Joe Brockmeier's post about weblogs having a shared format. Timely.

I've been thinking about this myself because I want such a format too. Although I have written a tool to serve my own needs I won't be using it forever and I (probably) want to take my blog with me. I've also been thinking about how to do backup and restore. The two problems appear to be the same to me.

I think we already have the answer: RSS. It's already a natural format for holding the essential data of a weblog and namespacing is an easy way to store the tool-specific data. A tool that understands another tools metadata (e.g. ENT topics) can import it, a tool that cannot can safely ignore it. Actually why are we even discussing this?

The real question seems to me to be: how best to use RSS for this purpose? Do we have one gigantic RSS feed for a weblog? In my case with about 2100 posts it would be pretty big and unwieldy. Back in 2004 Paolo and I were talking about how to do weblog archives.

I was messing with an approach that combined RSS and OPML to create a weblog archive. For each post/day/month (pick your granularity) create a corresponding RSS feed of weblog entries. These feeds are then referenced from an OPML file that defines the overall structure of the archived weblog. In this way you can quickly narrow down to find an individual post, or suck up the whole thing (useful for tools like Sigmund).

For convenience the whole lot could be wrapped up in a .tar.gz. It might be helpful to include some kind of (optional?) metadata file at the top-level that describes the contents (ala JAR archives).

I'm not sure why I stopped working on that, maybe it just got shoved aside by other things. I might have a go at adding this feature to Squib since we need a backup format anyway.

10/04/2006 09:30 by Matt Mower | Permalink | comments: