The Several Million Dollar Bug

In case you landed here without any context and have no idea who I am or what this article is about this article should give you some background.

For years we were skating on very thin ice. Our only advantage that we had over competitors is that we had figured something out that they had not. Which is that Netscape, Microsoft and pretty much every other browser vendor had made a small but crucial mistake in implementing HTTP. It was an easy mistake to make, and a lot of code would have to be written to ensure that the bug was not present, without making any difference to normal every day HTTP requests.

RFC 1945 says:

The HTTP protocol is based on a request/response paradigm. A client
establishes a connection with a server and sends a request to the
server in the form of a request method, URI, and protocol version,
followed by a MIME-like message containing request modifiers, client
information, and possible body content. The server responds with a
status line, including the message's protocol version and a success
or error code, followed by a MIME-like message containing server
information, entity metainformation, and possible body content.

So, let’s get this straight: Client connects, client sends request, server responds to that request. Pay attention to the hairline cracks of ambiguity in those sentences and how they are commonly interpreted. The fact that it is written in that order seems to imply that it has to be implemented in that order.

This stack overflow question gets it just right:

My question might sound stupid, but I just wanted to be sure :

    Is it possible to send an HTTP response before having the request 
    for that resource ?

Say for example you have an HTML page index.html that only shows a 
picture called img.jpg. Now, if your sever knows that a visitor will 
request the HTML file and then the jpg image every time :

Would it be possible for the server to send the image just after the 
HTML file to save time ?

I know that HTTP is a synchronous protocol, so in theory it should 
not work, but I just wanted someone to confirm it (or not).

Funny how people are still asking this question nearly 20 years later, and no, that was not a stupid question, in fact, once upon a time I was wondering about just that myself and the answer surprised me. There really are no stupid questions and checking such ‘obvious’ things sometimes pays off in a big way. Which is probably what those responding with their definitive ‘no’s’ to that SO question should have done. Now, for the moment ignoring such details as ‘keepalive’, mixing content types (the question is about html and jpg) and the maximum number of parallel connections that your browser has to the server, in theory, if you knew what the response was going to be before the request even arrived, would it work if you sent the response ahead of time?

And the answer to that question is a resounding YES, you can, it works like a dream and that’s the tiny little item that was the sum total of our competitive advantage. HTTP may be designed as a synchronous protocol but it is not implemented as a synchronous protocol! I figured this out one afternoon when making the early version of the webcam software. The frame rate absolutely sucked, it was 1 frame per second over an ISDN line with a round-trip time in the 100’s of ms or so between server and cam. Sometimes two. But at two channels bonded for a whopping 128Kbit upstream we should be able to do three, maybe more fps. And that’s a huge difference, the difference between ‘animated slideshow’ and ‘almost video’.

So I started toying with the idea of sending the response before the request was in, in fact, totally ignoring the requests! Of course, my initial response was ‘this will never work, for sure the browser will discard the responses to requests that it hasn’t sent yet’. Only it did not! So that’s exactly how we ended up doing it, set the TCP buffer size to roughly the size of a frame, check to see if the frame would fit in its entirety, if not, drop the whole thing and if it did fit then send it. Instant rate adaptation, and maximum frame rate across any kind of connectivity, all the way up to what the hardware could capture/compress/transmit. On a lan you’d get 15 fps (which was absolutely astounding at the time), 320x240, whereas over the WAN links of the day we’d scale it down to something more leisurely (and probably using a smaller image). But still quite good to look at, since it was simply jpeg frames (no incremental updates, so you got a whole frame of reasonably good quality rather than all kinds of blocky updates).

This little trick made lots of money, and I always wondered why our competitors didn’t catch on the little bit of black magic that made it work. All they really had to do was capture some packets in flight and the secret would be out. So much for having a defensible moat, score one for ‘trade secrets’, even if they’re kept in the open. Sending the response before the request is in is not exactly valid in a synchronous client/server protocol, but to this day this trick works wonders. So if you know what the request is going to be feel free to send the response ahead of time, it just might make you some money. At a minimum it will make your users happier, always a good thing.

Every time a major new browser update or such was announced I’d lose lots of sleep, being sure the game would be up, this time they’d close the loophole and we’d be out of luck. But it never did! (Of course, that’s blind luck but if you look long enough at any success I’m sure you’ll be able to identify a number of items that can only be described as lucky. I could re-write this post in terms of how good we were at engineering but that would be a total lie, we just got lucky that it worked and luckier still that it kept working as long as it did. SPDY now tries to make this kind of behaviour official.) Relying on obscure undocumented behaviour is rarely a smart move.

Of course, if you can build out your advantage beyond the trivial to copy then you should do that. But just because you don’t have a real technological advantage doesn’t mean you will be found out in time (or maybe not even at all…). (Unless you blog about it of course, so I guess the secret is definitely out now).

Jacques Mattheij

Technology, Coding and Business

The Several Million Dollar Bug