Thu May 5 00:03:06 PDT 2011

web standards deathmatch

Shit's fucked! God damn it, WHATWG!

Let me start at the beginning.

A not terribly well known feature of OpenID is delegated authentication. In brief, it's two <link> elements that point to an OpenID identity provider, in my case, myopenid.com. When I poke "bbot.org" into the login form on a site that accepts OpenID, (in specificationese, an "OpenID consumer") it's supposed to follow the links to myopenid.com, (an "identity provider") which does all the heavy lifting, and in the end, I show up as "bbot.org". I can change the identity provider to whatever, (google, AOL, livejournal) and still retain the same identity, which is actually pretty neat. There's also the benefit of hiding myopenid's ugly URL. (bbot.myopenid.com, ugh)

So far, so easy.

However, in HTML 4.01, <link> elements are supposed to be contained in a pair of matched <head></head> tags. But people are fallible, and so they tend to forget to close tags, or they misspell them, or they leave them out entirely; and so when a browser sees a <link> tag all by itself, it tries to do the right thing, and just parses it, rather than erroring out.

This kind of thing just drives the variety of person who writes specifications for programming languages up a wall. Zere vill be order in mein markup language! So the W3C wasted no time setting up a working group for XHTML, which specifies that tags must always be closed, everywhere, and single element tags like <br> have to be "self closing", viz. <br />. This didn't make a whole lot of sense, but the standards wonks waved their hands a lot and repeated "XML" a few dozen times, and since this was 2005, back when XML was shit-hot and everybody loved it, this settled the argument.

However, programmers (like those at wordpress.com) tended to use the special tags that declared their documents to be XHTML in their templates, because they deeply cared about web standards, and wanted to promote their use. Then they would pass them off to customers, who would write outrageously malformed code, then complain when it failed validation.

Obviously the problem with this situation was ideological impurity. Since users didn't care about validation, the W3C announced, they would make them care. XHTML 2.0 would require that browsers would stop rendering and display an error message at the very first parsing error. That'll show them!

Unfortunately, the W3C doesn't make browsers. They just write standards. Browsers are actually written by software vendors such as Apple, Google, Microsoft and the Mozilla Foundation. And none of these groups were terribly enamoured with a markup language that didn't really resemble HTML at all, was enormously fragile, and used by exactly nobody on the internet.[1]

So the browser vendors took their toys and went home, forming the WHATWG, and started work on HTML 5, completely bypassing XHTML 2. It rapidly became apparent that nobody was actually going to implement XHTML 2, so the W3C killed it off.

My more boring readers will recall that I recently rewrote the front page of bbot.org to validate as HTML 5. One of the more amusing tricks of HTML 5 is that the <html>, <head>, and <body> tags aren't actually required, since the browser has to render a page correctly even if they're missing. I obligingly removed them, then chortled to myself as the page validated perfectly. (Standards wonks find their kicks in odd places.)

Except! No! As the even more boring among you noticed immediately, the OpenID spec says the delegated authentication links have to be inside a <head> element! Damn you, OpenID!

This is a bug that has taken me seven months to discover, mostly because nobody uses OpenID anymore. I'd complain about compromising my perfect garden of pure ideology to make delegated authentication actually work, but that would be too ironic for words.


1: There's a whole bucket of implementation issues, too. Pop quiz, hotshot. What do you do when some user innocently forgets to close a tag in a comment on a blog post? Should the invalid markup in the comment wang the entire page? XML parsing is seriously expensive, computationally. Are you going to write a parser that checks the validity of every comment by Disqus' 35 million users, then reads the user's mind to figure out how they actually wanted to markup the text?


Posted by | Permanent link | File under: Linux