The other side of the moon: December 2010

Monday, December 27, 2010

Using bandwidth to mitigate latency

[This post is mirrored from the Performance Advent Calendar]

The craft of web performance has come a long way from yslow and the first few performance best practices. Engineers the web over have made the web faster and we have newer tools, many more guidelines, much better browser support and a few dirty tricks to make our users' browsing experience as smooth and fast as possible. So how much further can we go?

Physics

The speed of light has a fixed upper limit, though depending on the medium it passes through, it might be lower. In fibre, this is about 200,000 Km/s, and it's about the same for electricity through copper. This means that a signal sent over a cable that runs 7,000Km from New York to the UK would take about 35ms to get through. Channel capacity (the bit rate), is also limited by physics, and it's Shannon's Law that comes into play here.

This, of course, is the physical layer of our network. We have a few layers above that before we get to TCP, which does most of the dirty work to make sure that all the data that your application sends out actually gets to your client in the right order. And this is where it gets interesting.

TCP

Éric Daspet's article on latency includes an excellent discussion of how slow start and congestion control affect the throughput of a network connection, which is why google have been experimenting with an increased TCP initial window size and want to turn it into a standard. Each network roundtrip is limited by how long it takes photons or electrons to get through, and anything we can do to reduce the number of roundtrips should reduce total page download time, right? Well, it may not be that simple. We only really care about roundtrips that run end-to-end. Those that run in parallel need to be paid for only once.

When thinking about latency, we should remember that this is not a problem that has shown up in the last 4 or 5 years, or even with the creation of the Internet. Latency has been a problem whenever signals have had to be transmitted over a distance. Whether it is a rider on a horse, a signal fire (which incidentally has lower latency than light through fibre^[1]), a carrier pigeon or electrons running through metal, each has had its own problems with latency, and these are solved problems.

C-P-P

There are three primary ways to mitigate latency. Cache, parallelise and predict^[2]. Caching reduces latency by bringing data as close as possible to where it's needed. We have multiple levels of cache including the browser's cache, ISP cache, a CDN and front-facing reverse proxies, and anyone interested in web performance already makes good use of these. Prediction is something that's gaining popularity, and Stoyan has written a lot about it. By pre-fetching expected content, we mitigate the effect of latency by paying for it in advance. Parallelism is what I'm interested in at the moment.

Multi-lane highways

Mike Belshe's research shows that bandwidth doesn't matter much, but what interests me most is that we aren't exploiting all of this unused channel capacity. Newer browsers do a pretty good job of downloading resources in parallel, and with a few exceptions (I'm looking at you Opera), can download all kinds of resources in parallel with each other. This is a huge change from just 4 years ago. However, are we, as web page developers, building pages that can take advantage of this parallelism? Is it possible for us to determine the best combination of resources on our page to reduce the effects of network latency? We've spent a lot of time, and done a good job combining our JavaScript, CSS and decorative images into individual files, but is that really the best solution for all kinds of browsers and network connections? Can we mathematically determine the best page layout for a given browser and network characteristics^[3]?

Splitting the combinative

HTTP Pipelining could improve throughput, but given that most HTTP proxies have broken support for pipelining, it could also result in broken user experiences. Can we parallelise by using the network the way it works today? For a high capacity network channel with low throughput due to latency, perhaps it makes better sense to open multiple TCP connections and download more resources in parallel. For example, consider these two pages I've created using Cuzillion:

Have a look at the page downloads using FireBug's Net Panel to see what's actually happening. In all modern browsers other than Opera, the second page should load faster whereas in older browsers and in Opera 10, the first page should load faster.

Instead of combining JavaScript and CSS, split them into multiple files. How many depends on the browser and network characteristics. The number of parallel connections could start of based on the ratio of capacity to throughput and would reduce as network utilisation improved through larger window sizes over persistent connections. We're still using only one domain name, so no additional DNS lookup needs to be done. The only unknown is the channel capacity, but based on the source IP address and a geo lookup^[4] or subnet to ISP map, we could make a good guess. Boomerang already measures latency and throughput of a network connection, and the data gathered can be used to make statistically sound guesses.

I'm not sure if there will be any improvements or if the work required to determine the optimal page organisation will be worth it, but I do think it's worth more study. What do you think?

Footnotes

Signal fires (or even smoke signals) travel at the speed of light in air v/s light through fibre, however the switching time for signal fires is far slower, and you're limited to line of sight.
David A. Patterson. 2004. Latency lags bandwith^[PDF]. Commun. ACM 47, 10 (October 2004), 71-75.
I've previously written about my preliminary thoughts on the mathematical model.
CableMap.info has good data on the capacity and latency of various backbone cables.

Thursday, December 02, 2010

Bad use of HTML5 form validation is worse than not using it at all

...or why I'm glad we don't use fidelity any more

Fidelity's online stock trading account uses HTML5 form's pattern attribute to do better form validation, only they get it horribly wrong.

First some background...

The pattern attribute that was added to input elements allows the page developer to tell the browser what kind of pre-validation to do on the form before submitting it to the server. This is akin to JavaScript based form validation that runs through the form's onsubmit event, except that it's done before the form's onsubmit event fires. Pages can choose to use this feature while falling back to JS validation if it isn't available. They'd still need to do server-side validation, but that's a different matter. Unfortunately, when you get these patterns wrong, it's not possible to submit a valid value, and given how new this attribute is, many web devs have probably implemented it incorrectly while never having tested on a browser that actually supports the attribute.

This is the problem I faced with fidelity. First some screenshots. This is what the wire transfer page looks like if you try to transfer $100. The message shows up when you click on the submit button:

On Firefox 4:

On Opera:

There are a few things wrong with this. First, to any reasonable human being (and any reasonably well programmed computer too), the amount entered (100.00) looks like a valid format for currency information. I tried alternatives like $100, $100.00, 100, etc., but ended up with the same errors for all of them. Viewing the source told me what the problem was. This is what the relevant portion of the page source looks like:

$<input type="text" name="RED_AMT" id="RED_AMT"
        maxlength="11" size="14" value="" tabindex="10"
        pattern="$#,###.##"
        type="currency" onblur="onBlurRedAmt();"/>

The onblur handler reformatted the value so that it always had two decimal places and a comma to separate thousands, but didn't do anything else. The form's onsubmit event handler was never called. The pattern attribute, looked suspicious. This kind of pattern is what I'd expect for code written in COBOL, or perhaps something using perl forms. The pattern attribute, however, is supposed to be a JavaScript valid regular expression, and the pattern in the code was either not a regular expression, or a very bad one that requires several # characters after the end of the string.

The code also omits the title attribute which is strongly recommended for anything that uses the pattern attribute to make the resulting error message more meaningful, and in fact just usable. The result is that it's impossible to make a wire transfer using any browser that supports HTML5's new form element types and attributes. This is sad because while it looks like Fidelity had good intentions, they messed up horribly on the implementation, and put out an unusable product (unless of course you have firebug or greasemonkey and can mess with the code yourself).

I hacked up a little test page to see if I could reproduce the problem. It's reproducible in Firefox, but not in Opera and I can't figure out why. (Any ideas?). Also notice how using a title attribute makes the error message clearer.

One last point in closing. It looks like both Firefox and Opera's implementations (on Linux at least) have a bad focus grabbing bug. While the error message is visible, the browser grabs complete keyboard focus from the OS (or Windowing environment if you prefer). This means you can't do things like Alt+Tab, or PrtScrn, switching windows, etc. If you click the mouse anywhere, the error disappears. The result is that it's really hard to take a screenshot of the error message. I managed to do it by setting gimp to take a screenshot of the entire Desktop with a 4 second delay. The funny thing is that you can still use the keyboard within the browser to navigate through links, and this does not dismiss the error.

Update: The modality problem with the error message doesn't show up on Firefox 4 on MacOSX

The other side of the moon