<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:ent="http://www.purl.org/NET/ENT/1.0/" version="2.0">
  <channel>
    <title>Curiouser and Curiouser! on open-data-formats</title>
    <link>http://matt.blogs.it/</link>
    <description>RSS feed for topic open-data-formats</description>
    <copyright>Copyright 2006 Matt Mower</copyright>
    <generator>Squib/0.4.0.345</generator>
    <managingEditor>self@mattmower.com</managingEditor>
    <webMaster>self@mattmower.com</webMaster>
    <language>en-gb</language>
    <item>
      <title>Word escape velocity attained at last!</title>
      <link>http://matt.blogs.it/entries/00001423.html</link>
      <pubDate>Mon, 26 Apr 2004 09:22:41 +0100</pubDate>
      <description>&lt;blockquote&gt;&lt;a href="http://www.sockdrawer.org/blog/archives/000012.php"&gt;Using Open Office to convert MS Word documents&lt;/a&gt;.
Rickard Öberg recently posted a request for suggestions about using
Java to convert MS word docs into HTML. I have been doing some work on
this lately using the freely availiable, open-source OpenOffice.org to
do the hard parts, making calls... [&lt;a href="http://www.sockdrawer.org/blog/"&gt;sockdrawer.org&lt;/a&gt;]&lt;br&gt;
&lt;/blockquote&gt;
&lt;br&gt;
Paul has done some very craft work here -- many people want to solve this problem.&lt;br&gt;
&lt;br&gt;
In fact it's a problem that he and I worked on last year.&amp;nbsp; At that
time we were looking for out of the box tools to do the job and not
getting very far with it.&amp;nbsp; Since then he's cooked up a clever
solution by implementing RPC with an OpenOffice server.&amp;nbsp; Neat!!&lt;br&gt;
</description>
      <guid isPermaLink="true">http://matt.blogs.it/entries/00001423.html</guid>
      <ent:cloud ent:href="http://matt.blogs.it/topics/">
      </ent:cloud>
    </item>
    <item>
      <title>Combining OPML and RSS to create an export format for a blog</title>
      <link>http://matt.blogs.it/entries/00002173.html</link>
      <pubDate>Mon, 10 Apr 2006 09:30:46 +0100</pubDate>
      <description>&lt;p&gt;&lt;a href="http://blog.broadbandmechanics.com/2006/04/staying-on-top-whats-up"&gt;Marc Canter links&lt;/a&gt; to Joe Brockmeier's post about &lt;a href="http://internet.newsforge.com/article.pl?sid=06/04/04/2051237&amp;amp;from=rss"&gt;weblogs having a shared format&lt;/a&gt;. Timely.&lt;/p&gt;

&lt;p&gt;I've been thinking about this myself because I want such a format too. Although I have &lt;a href="http://squib.rubyforge.org/"&gt;written a tool to serve my own needs&lt;/a&gt; I won't be using it forever and I (&lt;em&gt;probably&lt;/em&gt;) want to take my blog with me. I've also been thinking about how to do backup and restore. The two problems appear to be the same to me.&lt;/p&gt;

&lt;p&gt;I think we already have the answer: RSS. It's already a natural format for holding the essential data of a weblog and namespacing is an easy way to store the tool-specific data. A tool that understands another tools metadata (e.g. &lt;a href="http://matt.blogs.it/specs/ENT/1.0/"&gt;ENT topics&lt;/a&gt;) can import it, a tool that cannot can safely ignore it. Actually &lt;em&gt;why are we even discussing this?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The real question seems to me to be: how best to use RSS for this purpose? Do we have one gigantic RSS feed for a weblog? In my case with about 2100 posts it would be pretty big and unwieldy. Back in 2004 Paolo and I were talking about how to do &lt;a href="http://matt.blogs.it/2004/10/13.html#a1596"&gt;weblog&lt;/a&gt; &lt;a href="http://paolo.evectors.it/2004/10/07.html#a2276"&gt;archives&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I was messing with an approach that combined RSS and OPML to create a weblog archive. For each post/day/month (pick your granularity) create a corresponding RSS feed of weblog entries. These feeds are then referenced from an OPML file that defines the overall structure of the archived weblog. In this way you can quickly narrow down to find an individual post, or suck up the whole thing (useful for tools like &lt;a href="http://anjo.blogs.com/metis/"&gt;Sigmund&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;For convenience the whole lot could be wrapped up in a .tar.gz.  It might be helpful to include some kind of (optional?) metadata file at the top-level that describes the contents (ala JAR archives).&lt;/p&gt;

&lt;p&gt;I'm not sure why I stopped working on that, maybe it just got shoved aside by other things. I might have a go at adding this feature to Squib since we need a backup format anyway.&lt;/p&gt;</description>
      <guid isPermaLink="true">http://matt.blogs.it/entries/00002173.html</guid>
      <ent:cloud ent:href="http://matt.blogs.it/topics/">
      </ent:cloud>
    </item>
    <item>
      <title>An experimental OPML+RSS archive for C&amp;C</title>
      <link>http://matt.blogs.it/entries/00002181.html</link>
      <pubDate>Wed, 12 Apr 2006 19:48:29 +0100</pubDate>
      <description>&lt;p&gt;Over the last couple of days I've hacked together experimental support for OPML+RSS archives in &lt;a href="http://squib.rubyforge.org/"&gt;Squib&lt;/a&gt; as I &lt;a href="http://matt.blogs.it/entries/00002173.html"&gt;described a couple of days ago&lt;/a&gt;. You can grab my entire archive &lt;a href="http://matt.blogs.it/archive/"&gt;from here&lt;/a&gt; either directly or as a .tar.gz archive.&lt;/p&gt;

&lt;p&gt;The structure of the archive looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src="http://matt.blogs.it/images/misc/archive_structure.jpg" alt="OPML+RSS weblog archive format"/&gt;&lt;/p&gt;

&lt;p&gt;The weblog.opml file is an outline that contains the date-based structure of Curiouser and Curiouser. There is a branch for each year, and each month of each year. At the leaves are pointers to daily RSS files and the ID &amp;amp; title of entries.&lt;/p&gt;

&lt;p&gt;It occurred to me that I could just put the entire entry data directly into the OPML file and cut out the RSS. However, with over 2,100 entries, I felt that would lead to a very big and unwieldy file. Being just a file of pointers means it can still be sensibly opened in an OPML editor.&lt;/p&gt;

&lt;p&gt;Another reason for using RSS is to ensure that users of the archive can take advantage of all the software out there to parse RSS. Once you've figured out which days entries you want, you can hand the corresponding RSS file to a standard parser and get back the entries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: I've added &lt;code&gt;&amp;lt;link rel="archive" type="application/opml+xml" href=".../archive/data/weblog.opml" /&amp;gt;&lt;/code&gt; to my home page to allow archive auto-discovery. I did a minimum amount of research before doing this so please correct me if that's a gross misuse of a link tag or there is some established way of doing this already.&lt;/p&gt;</description>
      <guid isPermaLink="true">http://matt.blogs.it/entries/00002181.html</guid>
      <ent:cloud ent:href="http://matt.blogs.it/topics/">
      </ent:cloud>
    </item>
  </channel>
</rss>
