So, I’m writing blog software, and one of the obvious things I want to do is import from this blog. As a first step towards that, I export the entire contects using WP’s export tool (after purging more than 17,000 comment spams since I last manually purged) and this is what it looks like (previous post to this):
<item>
<title>Interesting words in your OSX Dictionary</title>
<link>http://fukamachi.org/wp/2008/03/11/fake-words-in-your-osx-dictionary/</link>
<pubDate>Tue, 11 Mar 2008 03:03:46 +0000</pubDate>
<dc:creator>Sho</dc:creator>
<category><![CDATA[Language]]></category>
<category><![CDATA[leopard]]></category>
<category><![CDATA[mac]]></category>
<category domain="tag"><![CDATA[dictionary]]></category>
<category domain="tag"><![CDATA[esquivalience]]></category>
<guid isPermaLink="false">http://fukamachi.org/wp/2008/03/11/fake-words-in-your-osx-dictionary/</guid>
<description></description>
<content:encoded><![CDATA[Using Leopard? Try this. Look up the word esquivalience by selecting it and choosing dictionary from the contextual menu. Read the dictionary definition, then the wikipedia one underneath : )]]></content:encoded>
<wp:post_id>713</wp:post_id>
<wp:post_date>2008-03-11 12:03:46</wp:post_date>
<wp:post_date_gmt>2008-03-11 03:03:46</wp:post_date_gmt>
<wp:comment_status>open</wp:comment_status>
<wp:ping_status>open</wp:ping_status>
<wp:post_name>fake-words-in-your-osx-dictionary</wp:post_name>
<wp:status>publish</wp:status>
<wp:post_parent>0</wp:post_parent>
<wp:menu_order>0</wp:menu_order>
<wp:post_type>post</wp:post_type>
</item>
Jesus, that is *horrible*. Firstly, if the post_type is defined only towards the end, what’s with the post_id, post_date, post_name etc? It’s a post - of post_type post! Secondly, where’s the “updated at” field? What’s the “dc:” namespace for the creator tag only? What’s with having an “isPermalink” switch in the guid tag? The permalink is in the link tag, I presume. Why does it need to be content:encoded when obviously the content is CDATA - implying that WP somehow supports XML parsing inside some contents!? Why is pubDate camelCase while everything else is underline_style? Man, I hate camelCase. Etc etc. What a mess.
I know what you’re thinking: that’s just RSS format! Sure it’s ugly, it’s RSS! Well, no. The RSS is similar but different for this post - I examined the feed for that, too. Note that the description is empty, it isn’t in the RSS. So they’re using a modified RSS format to store internal data. If they’re not going to store description, but just generate it on the fly - why export empty description tags?!
Just for comparison, here’s the much nicer atom feed. Obviously doesn’t have all the wp: internal data, but I much prefer the design:
<entry>
<author>
<name>Sho</name>
<uri>http://fukamachi.org/</uri>
</author>
<title type="html"><![CDATA[Interesting words in your OSX Dictionary]]></title>
<link rel="alternate" type="text/html" href="http://fukamachi.org/wp/2008/03/11/fake-words-in-your-osx-dictionary/" />
<id>http://fukamachi.org/wp/2008/03/11/fake-words-in-your-osx-dictionary/</id>
<updated>2008-03-11T03:04:31Z</updated>
<published>2008-03-11T03:03:46Z</published>
<category scheme="http://fukamachi.org/wp" term="Language" />
<category scheme="http://fukamachi.org/wp" term="leopard" />
<category scheme="http://fukamachi.org/wp" term="mac" />
<category scheme="http://fukamachi.org/wp" term="dictionary" />
<category scheme="http://fukamachi.org/wp" term="esquivalience" />
<summary type="html"><![CDATA[ Using Leopard? Try this. Look up the word esquivalience by selecting it and choosing dictionary from the contextual menu. Read the dictionary definition, then the wikipedia one underneath : )]]></summary>
<content type="html" xml:base="http://fukamachi.org/wp/2008/03/11/fake-words-in-your-osx-dictionary/">
<![CDATA[ <p>Using Leopard? Try this. Look up the word esquivalience by selecting it and choosing dictionary from the contextual menu. Read the dictionary definition, then the wikipedia one underneath : )</p>]]>
</content>
</entry>
Note logical, consistent design, self-closing tags, and other innovations.
UPDATE: Check out the comment format:
<wp:comment>
<wp:comment_id>3</wp:comment_id>
<wp:comment_author><![CDATA[nigger cock]]></wp:comment_author>
<wp:comment_author_email>gay@nigger.cock.org</wp:comment_author_email>
<wp:comment_author_url>http://nigger.org/</wp:comment_author_url>
<wp:comment_author_IP>127.0.0.1</wp:comment_author_IP>
<wp:comment_date>2005-07-16 10:23:48</wp:comment_date>
<wp:comment_date_gmt>2005-07-16 14:23:48</wp:comment_date_gmt>
<wp:comment_content>Hey, is this that new gay nigger cock website I've been hearing about?</wp:comment_content>
<wp:comment_approved>1</wp:comment_approved>
<wp:comment_type></wp:comment_type>
<wp:comment_parent>0</wp:comment_parent>
</wp:comment>
The comment author is CDATA, but the content isn’t? WTF?