We Suck at HTTP

By Deane Barker

I absolutely loved this New York Times column which lamented the world of apps, where we don’t have the capability to link to content anymore:

Unlike web pages, mobile apps do not have links. They do not have web addresses. They live in worlds by themselves, largely cut off from one another and the broader Internet. And so it is much harder to share the information found on them.

Yes, yes, for the love of God yes.

We have broken HTTP. We’ve done it for years in fits and starts, but apps have completely broken it. HTTP was a good specification which we’ve steadily whittled away.

URLs have a purpose. We are very cavalier about that purpose. We don’t use canonicals. We’re sloppy about switching back and forth between HTTP and HTTPs. We don’t bother to logically structure our URLs. We rebuild websites and let all the links break. We don’t appreciate that crawlers are dumb and they need more context than humans.

Did you know there’s something called a URN – Uniform Resource Name? This was supposed to be one level above a URL. Your resource would have a URN, which would be a global identifier, and it would resolve to a URL which was just where the resource was located right now. URNs never caught on, but they web would be better if they had. Content could then have a “name” which was matched to it forever, regardless of its current URL. (The “guid” element in RSS probably should have been named “urn,” in fact.)

And it’s not just URLs. HTTP status codes exist for a reason too. Did you know that there are a lot of them? In fact, there’s one for about everything that could happen for a web request. Did you know there’s a difference between 404 and 410? 404 (traditionally “Not Found”) means it was never here. 410 (traditionally “Gone”) means it was once here but is now gone. Big difference.

Ever hear of 303 and 307? They’re meant for load redirects (mirrors). The human readable descriptions are usually “See Other” or “Temporary Redirect.” Did you know there was a “402 Payment Required”? There’s a bunch that were just never implemented. These days a lot of websites just return “200 OK” for everything, even 404s, which drives me freaking nuts. (And, yes, I’m sure I’ve done it, so don’t go looking too hard through my portfolio…)

(A new company called Words API (it’s an API…for words) made me jump for joy when I saw they are using actual, intelligent HTTP status codes on their responses, even their errors. If you go over your usage limit, for example, you get a “429 Too Many Requests “ back. Good for them.)

Do you know why your FORM tag has an attribute called “method”? Because you’re calling a method on a web server, like a method on an object in OOP. Did you know there are other methods besides GET and POST? There’s HEAD and OPTIONS and PUT and DELETE. And you can write your own. So if you’re passing data back and forth between your app/site and your web server, you’re welcome to name custom methods in the leading line of the header.

And, technically, you’re supposed to make sure GET requests are idempotent, meaning they can be repeated with no changes to a resource. So you should be able to hit Refresh all day on a GET request without causing any data change (beyond perhaps analytics). If you’re changing data on a server, that should always be a POST request (or PUT or DELETE, if anyone ever used them as intended).

I could go on and on. Don’t even get me started about URL parameters. No, not querystrings – there was originally a specification where you could do something like ‘/key1:value1/key2:value2/' to pass data into a request. And what about the series of “UA-*” headers that existed to tell the web server information about the rendering capabilities of the user agent? (And dare I wander off into metadata-related ranting…two words people, Dublin Core*!)

My point is that a lot of web developers today are completely ignorant of the protocol that is the basis for their job. A core understanding of HTTP should be a base requirement for working in this business. To not do that is to ignore a massive part of digital history (which we’re also very good at).

I’m currently working through HTTP: The Definitive Guide by O’Reilly. The book was written in 2002, but HTTP hasn’t changed much since then. It’s fascinating to read all the features built into HTTP that no one uses because they were never adopted or no one bothered to do some research before they re-solved a problem. There’s a lot of stuff in there that solves problems we’ve since programmed our way around. The designers of the spec were probably smarter than you, it turns out.

(HTTP/2 is currently proposed, but it doesn’t change much of the high level stuff. The changes are mostly low-level data transport hacks, based on Google’s experience with SPDY.)

At risk of sounding like a crabby old man (I’m 43 and have been developing for the web since 1996), this is one small symptom of a larger problem – developers tend to think they can solve every problem, and they’re pretty sure that nothing good happened before they arrived on the scene. Anyone working in this space 20 years ago couldn’t possibly have known of their problems so every problem deserves a new solution.

Developers often don’t know what they don’t know (that link goes to my personal confession of this exact thing), and they feel no need to study the history of their technology to gain some context about it. Hell, we all need to sit and read The Innovators together.

Narcissism runs rampant in this industry, and our willingness to throw away and ignore some of the core philosophies of HTTP is just one manifestation of this. Rant over.

This is item #36 in a sequence of 357 items.

You can use your left/right arrow keys to navigate