Thursday, February 05, 2004

Mark Pilgrim explains the different flavours of RSS 

I have often stated (1, 2, 3) that there are 7 different and incompatible versions of RSS. This was based on an embarassingly simple formula: I counted the version numbers in use (0.90, 0.91, 0.92, 0.93, 0.94, 1.0, and 2.0) and came up with the number 7. But recently some people have taken to claiming that there are not 7 versions (despite obvious evidence to the contrary), and even if there are, that they are somehow compatible with each other so it doesn't really matter. So I dug a little further to precisely document the incompatible changes in each version of RSS.

I would like to publicly apologize for my previous misstatements. There are not 7 different and incompatible versions of RSS; there are 9.

In March of 1999, Netscape released RSS 0.90. RSS 0.90 looks like this:

Example 1. RSS 0.90


<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns="http://my.netscape.com/rdf/simple/0.9/">

<channel>

<title>Mozilla Dot Org</title>

<link>http://www.mozilla.org</link>

<description>the Mozilla Organization web site</description>

</channel>

<image>

<title>Mozilla</title>

<url>http://www.mozilla.org/images/logo.gif</url>

<link>http://www.mozilla.org</link>

</image>

<item>

<title>New Status Updates</title>

<link>http://www.mozilla.org/status/</link>

</item>

</rdf:RDF>

In July of 1999, Netscape released RSS 0.91. Netscape's RSS 0.91 was intentionally incompatible with RSS 0.90. They dropped the RDF-compatible syntax and redesigned RSS to be pure XML. They also added a DTD which defined several allowable entities (more on these below).

Netscape's RSS 0.91 looks like this:

Example 2. Netscape RSS 0.91


<!DOCTYPErss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">

<channel>

<title>Example Channel</title>

<link>http://example.com/</link>

<description>an example feed</description>

<language>en</language>

<rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>

<textinput>

<title>Search this site:</title>

<description>Find:</description>

<name>q</name>

<link>http://example.com/search</link>

</textinput>

<skipHours>

<hour>0</hour>

</skipHours>

<item>

<title>1 &lt; 2</title>

<link>http://example.com/1_less_than_2.html</link>

<description>1 &lt; 2, 3 &lt; 4.

In HTML, &lt;b&gt; starts a bold phrase

andyou start a link with &lt;a href=

</description>

</item>

</channel>

</rss>

In June of 2000, Userland took Netscape's RSS specification, removed Netscape's copyright statement, made several incompatible changes, added a Userland copyright statement, called it RSS 0.91, and claimed that it was compatible with Netscape's RSS 0.91.

Userland's flavor of RSS 0.91 looks like this:

Example 3. Userland's RSS 0.91


<rss version="0.91">

<channel>

<title>Example Channel</title>

<link>http://example.com/</link>

<description>an example feed</description>

<language>en</language>

<rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>

<textInput>

<title>Search this site:</title>

<description>Find:</description>

<name>q</name>

<link>http://example.com/search</link>

</textInput>

<skipHours>

<hour>24</hour>

</skipHours>

<item>

<title>1 &lt; 2</title>

<link>http://example.com/1_less_than_2.html</link>

<description>1 &lt; 2, 3 &lt; 4.

In HTML, &lt;b&gt; starts a bold phrase

andyou start a link with &lt;a href=

</description>

</item>

</channel>

</rss>

Userland's RSS 0.91 is incompatible with Netscape's RSS 0.91 in several ways:

  1. Netscape's RSS 0.91 specifies that <hour> within <skipHours> has a range from 0 to 23. Userland's RSS 0.91 specifies that <hour> has a range of 1 to 24.
  2. Netscape's RSS 0.91 contains a textinput element. Userland's RSS 0.91 contains a textInput element. Note the capitalization; XML element names are case-sensitive, so this is a completely different element.
  3. Netscape's RSS 0.91 uses a DTD which allows publishers to use 96 named entities: &nbsp;, &iexcl;, &cent;, &pound;, &curren;, &yen;, &brvbar;, &sect;, &uml;, &copy;, &ordf;, &laquo;, &not;, &shy;, &reg;, &macr;, &deg;, &plusmn;, &sup2;, &sup3;, &acute;, &micro;, &para;, &middot;, &cedil;, &sup1;, &ordm;, &raquo;, &frac14;, &frac12;, &frac34;, &iquest;, &Agrave;, &Aacute;, &Acirc;, &Atilde;, &Auml;, &Aring;, &AElig;, &Ccedil;, &Egrave;, &Eacute;, &Ecirc;, &Euml;, &Igrave;, &Iacute;, &Icirc;, &Iuml;, &ETH;, &Ntilde;, &Ograve;, &Oacute;, &Ocirc;, &Otilde;, &Ouml;, &times;, &Oslash;, &Ugrave;, &Uacute;, &Ucirc;, &Uuml;, &Yacute;, &THORN;, &szlig;, &agrave;, &aacute;, &acirc;, &atilde;, &auml;, &aring;, &aelig;, &ccedil;, &egrave;, &eacute;, &ecirc;, &euml;, &igrave;, &iacute;, &icirc;, &iuml;, &eth;, &ntilde;, &ograve;, &oacute;, &ocirc;, &otilde;, &ouml;, &divide;, &oslash;, &ugrave;, &uacute;, &ucirc;, &uuml;, &yacute;, &thorn;, and &yuml;. Userland's RSS 0.91 removes the DTD, therefore all of these named entities are invalid and may not be used.

In December of 2000, the RSS-DEV Working Group released RSS 1.0, which they claimed was compatible with RSS 0.90. (In fact, it is completely incompatible and shares no elements with RSS 0.90 at all, since it uses a different namespace.) RSS 1.0 was also intentionally incompatible with both Netscape RSS 0.91 and Userland RSS 0.91, due to RSS 1.0's RDF syntax.

RSS 1.0 looks like this:

Example 4. RSS 1.0


<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns="http://purl.org/rss/1.0/">

<channel>

<title>Example Dot Org</title>

<link>http://www.example.org</link>

<description>the Example Organization web site</description>

<items>

<rdf:Seq>

<rdf:li resource="http://www.example.org/status/"/>

</rdf:Seq>

</items>

</channel>

<image rdf:about="http://www.example.org/images/logo.gif"/>

<image rdf:about="http://www.example.org/images/logo.gif">

<title>Example</title>

<url>http://www.example.org/images/logo.gif</url>

<link>http://www.example.org</link>

</image>

<item rdf:about="http://www.example.org/status/">

<title>New Status Updates</title>

<link>http://www.example.org/status/</link>

<description>News about the Example project</description>

</item>

</rdf:RDF>

Later in December of 2000, Userland released RSS 0.92, which they claimed was compatible with their flavor of RSS 0.91.

Example 5. RSS 0.92


<rss version="0.92">

<channel>

<title>Example Channel</title>

<link>http://example.com/</link>

<description>an example feed</description>

<language>en</language>

<rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>

<textInput>

<title>Search this site:</title>

<description>Find:</description>

<name>q</name>

<link>http://example.com/search</link>

</textInput>

<skipHours>

<hour>24</hour>

</skipHours>

<item>

<title>1 &lt; 2</title>

<link>http://example.com/1_less_than_2.html</link>

<description>1 &lt; 2, 3 &lt; 4.

In HTML, &lt;b&gt; starts a bold phrase

andyou start a link with &lt;a href=

</description>

</item>

</channel>

</rss>

RSS 0.92 is incompatible with Netscape RSS 0.91 for all the reasons that Userland RSS 0.91 is incompatible with Netscape RSS 0.91. It is also incompatible with Userland RSS 0.91, because the content model of <description> was changed from plain text to HTML. The RSS 0.92 example (example 5) appears identical to the Userland RSS 0.91 example (example 3) in every way except the version number, but it means something different. To create an RSS 0.92 feed that means the same thing as example 3, you need to escape the <description>, like this:

Example 6. RSS 0.92, properly escaped


<rss version="0.92">

<channel>

<title>Example Channel</title>

<link>http://example.com/</link>

<description>an example feed</description>

<language>en</language>

<rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>

<textInput>

<title>Search this site:</title>

<description>Find:</description>

<name>q</name>

<link>http://example.com/search</link>

</textInput>

<skipHours>

<hour>24</hour>

</skipHours>

<item>

<title>1 &lt; 2</title>

<link>http://example.com/1_less_than_2.html</link>

<description>1 &amp;lt; 2, 3 &amp;lt; 4.

In HTML, &amp;lt;b&amp;gt; starts a bold phrase

andyou start a link with &amp;lt;a href=

</description>

</item>

</channel>

</rss>

In April of 2001, Userland released a draft of RSS 0.93, which they claimed was compatible with RSS 0.92 and their flavor of RSS 0.91. Although never officially blessed for public use, RSS 0.93 is in fact currently being used by companies as large as Disney (who is quite proud of it). RSS 0.93 shares the same content model as RSS 0.92, and is therefore incompatible with all versions of RSS prior to 0.92. It also adds an optional <expirationDate> element, the significance of which will become apparent shortly.

RSS 0.93 looks like this:

Example 7. RSS 0.93


<rss version="0.93">

<channel>

<title>Example Channel</title>

<link>http://example.com/</link>

<description>an example feed</description>

<language>en</language>

<rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>

<textInput>

<title>Search this site:</title>

<description>Find:</description>

<name>q</name>

<link>http://example.com/search</link>

</textInput>

<skipHours>

<hour>24</hour>

</skipHours>

<item>

<title>1 &lt; 2</title>

<link>http://example.com/1_less_than_2.html</link>

<description>1 &amp;lt; 2, 3 &amp;lt; 4.

In HTML, &amp;lt;b&amp;gt; starts a bold phrase

andyou start a link with &amp;lt;a href=

</description>

<expirationDate>Sat, 29 Nov 2003 10:17:13 GMT</expirationDate>

</item>

</channel>

</rss>

In August of 2002, Userland released a draft of RSS 0.94, which they claimed was compatible with RSS 0.93, RSS 0.92, and their flavor of RSS 0.91. Although never officially blessed for public use, RSS 0.94 is currently being used by several popular technically-oriented sites such as Ars Technica, as well as the official project feed for at least one RSS aggregator.

RSS 0.94 is incompatible with all previous versions of RSS in several ways:

  1. RSS 0.94 is incompatible with RSS 0.93, because RSS 0.94 drops the <expirationDate> element introduced in RSS 0.93.
  2. RSS 0.94 introduces a significant change to the content model: a new type attribute on the <description> element, which gives the MIME type of the description. The default type is "text/html", which means that if not specified, RSS 0.94 shares the content mode of RSS 0.92, and is therefore incompatible with all versions of RSS prior to RSS 0.92. And if type is specified, RSS 0.93-aware clients that do not know about the new attribute will misinterpret the content by incorrectly assuming it is HTML.

Due to odd historical circumstances, no official copies of the RSS 0.94 specification exist. The above-linked RSS 0.94 specification incorrectly claims that it describes RSS 2.0.

RSS 0.94 looks like this:

Example 8. RSS 0.94


<rss version="0.94">

<channel>

<title>Example Channel</title>

<link>http://example.com/</link>

<description>an example feed</description>

<language>en</language>

<rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>

<textInput>

<title>Search this site:</title>

<description>Find:</description>

<name>q</name>

<link>http://example.com/search</link>

</textInput>

<skipHours>

<hour>24</hour>

</skipHours>

<item>

<title>1 &lt; 2</title>

<link>http://example.com/1_less_than_2.html</link>

<description type="text/plain">1 &lt; 2, 3 &lt; 4.

In HTML, &lt;b&gt; starts a bold phrase

andyou start a link with &lt;a href=

</description>

</item>

</channel>

</rss>

In September of 2002, Userland released RSS 2.0, which they claimed was compatible with RSS 0.94, RSS 0.93, RSS 0.92, and their flavor of RSS 0.91. RSS 2.0 is incompatible with all previous versions of RSS in several ways:

  1. RSS 2.0 drops the <rating> element that was allowed in Netscape's RSS 0.91, Userland's RSS 0.91, RSS 0.92, RSS 0.93, and RSS 0.94.
  2. RSS 2.0 drops the type attribute introduced in RSS 0.94, because it is a mistake to add confusion to the all-important description element. The RSS 2.0 specification states that <description> may contain HTML, but there is no way for consumers to programmatically distinguish HTML from plain text (especially text that talks about markup). In other words, the content model for RSS 2.0 is Here's something that might be HTML. Or maybe not. I can't tell you, and you can't guess.

RSS 2.0 looks like this:

Example 9. RSS 2.0


<rss version="2.0">

<channel>

<title>Example Channel</title>

<link>http://example.com/</link>

<description>an example feed</description>

<language>en</language>

<textInput>

<title>Search this site:</title>

<description>Find:</description>

<name>q</name>

<link>http://example.com/search</link>

</textInput>

<skipHours>

<hour>24</hour>

</skipHours>

<item>

<title>1 &lt; 2</title>

<link>http://example.com/1_less_than_2.html</link>

<description>1 &amp;lt; 2, 3 &amp;lt; 4.

In HTML, &amp;lt;b&amp;gt; starts a bold phrase

andyou start a link with &amp;lt;a href=

</description>

</item>

</channel>

</rss>

In November of 2002, Userland released RSS 2.01, which they claimed was compatible with RSS 2.0, RSS 0.94, RSS 0.93, RSS 0.92, and their flavor of RSS 0.91. RSS 2.01 changes the semantics of the <skipHours> element. In RSS 0.94, RSS 0.93, RSS 0.92, and Userland's RSS 0.91, hours had a range of 1 to 24. In RSS 2.01, hours now have a range of 0 to 23. RSS 2.01 shares the content model of RSS 2.0, which means it is incompatible with RSS 0.94 and all versions of RSS prior to RSS 0.92.

RSS 2.01 looks like this:

Example 10. RSS 2.0, post-11/11/2002 (RSS 2.01)


<rss version="2.0">

<channel>

<title>Example Channel</title>

<link>http://example.com/</link>

<description>an example feed</description>

<language>en</language>

<textInput>

<title>Search this site:</title>

<description>Find:</description>

<name>q</name>

<link>http://example.com/search</link>

</textInput>

<skipHours>

<hour>0</hour>

</skipHours>

<item>

<title>1 &lt; 2</title>

<link>http://example.com/1_less_than_2.html</link>

<description>1 &amp;lt; 2, 3 &amp;lt; 4.

In HTML, &amp;lt;b&amp;gt; starts a bold phrase

andyou start a link with &amp;lt;a href=

</description>

</item>

</channel>

</rss>

The RSS 2.01 specification was published in place of the RSS 2.0 specification; no official copies of the RSS 2.0 specification exist. As you can see from example 10, RSS 2.01 feeds use the same "2.0" version number as RSS 2.0, making it impossible to programmatically distinguish them. All RSS 2.0 feeds must be assumed to be RSS 2.01 feeds, despite the fact that RSS 2.01 is incompatible with RSS 2.0. This means that, if you published an valid RSS 2.0 feed on November 10th that contained <hour>24</hour>, you would wake up on November 11th to find that your feed had become invalid while you slept.

In January of 2003, Userland changed the already-published RSS 2.01 specification, to add a <rating> element again. The content model remains the same, which means RSS 2.01 is still incompatible with RSS 0.94 and all versions of RSS prior to RSS 0.92.

RSS 2.01 now looks like this:

Example 11. RSS 2.0, post-1/21/2003 (RSS 2.01 rev 2)


<rss version="2.0">

<channel>

<title>Example Channel</title>

<link>http://example.com/</link>

<description>an example feed</description>

<language>en</language>

<rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>

<textInput>

<title>Search this site:</title>

<description>Find:</description>

<name>q</name>

<link>http://example.com/search</link>

</textInput>

<skipHours>

<hour>0</hour>

</skipHours>

<item>

<title>1 &lt; 2</title>

<link>http://example.com/1_less_than_2.html</link>

<description>1 &amp;lt; 2, 3 &amp;lt; 4.

In HTML, &amp;lt;b&amp;gt; starts a bold phrase

andyou start a link with &amp;lt;a href=

</description>

</item>

</channel>

</rss>

Once again, the new RSS 2.01 specification was published in place over the old specification; no official copies of the previous version of the RSS 2.01 specification exist. Neither the revision number of the spec ("2.01") nor the version number of the format ("2.0") was changed, making it impossible to programmatically distinguish between them. This means that if a feed contains a <rating> element and declares itself as RSS 2.0, it is impossible to know whether the feed is valid unless you also know when the feed was created.

Summary

There are 9 versions of RSS, all of which are incompatible with various other versions. RSS 0.90 is incompatible with Netscape's RSS 0.91, Netscape's RSS 0.91 is incompatible with Userland's RSS 0.91, Netscape's RSS 0.91 is incompatible with RSS 1.0, Userland's RSS 0.91 is incompatible with RSS 0.92, RSS 0.92 is incompatible with RSS 0.93, RSS 0.93 is incompatible with RSS 0.94, RSS 0.94 is incompatible with RSS 2.0, and RSS 2.0 is incompatible with itself.


[dive into mark]

No comments: