The other side of the moon

/bb|[^b]{2}/
Never stop Grokking

Sunday, January 10, 2010

Bandwidth test v1.1

I've just bumped the version number on the bandwidth test to 1.1. There were two major changes that I'll describe below.

Changes

The changes in this release were both detected when testing via mobile phones, but they should improve the test's reliability for users of all (javascript enabled) browsers.
1. I noticed that on the Nokia E71, the latency test wasn't running. After much debugging (this browser doesn't have a firebug equivalent), it turned out that the browser will not fire any events on an image object if the HTTP response's content length is 0.

Since I was using a 0 byte file named image-l.png, this file's content-type was set to image/png, but its content-length was 0. Most browsers fired the onerror event when this happened, but Nokia's browser which is based on AppleWebKit/413 fires nothing. I then changed the image to return a 204 No Content HTTP response code, but I had the same problem. The only solution was to send some content. After playing around with several formats, I found that GIF could generate the smallest image of all at 35 bytes, so I used that. I haven't noticed any change in latency results on my desktop browser after the change, so I think it should be okay.

This also means that browsers will now fire the onload event instead of onerror, so I changed that code as well.
2. The second change fixes a bug in the code related to timed out images. This is what was happening.

A browser downloads images of progressively larger size until it hits an image that takes more than 3 seconds to download. When this happens, it aborts the run, and moves on to the next run, or to the latency test.

The bug was that even though the javascript thought it was aborting the run, the browser did not stop the download, even after setting the image object to null. As a result of this, the next run, or the latency check was running in parallel with this big download. In some cases, this meant that the next run would be slower than it should have been, but in other cases, it meant that the images for the next run would block and not be downloaded until this big image completed. The result of this was that those other images would also time out, making the problem worse.

For the latency check, this meant that the image would never download, and that's why latency would show up as NaN -- I was trying to get the median of an empty array.

I fixed this by changing the timeout logic a bit. Now a timeout does not abort the run, it only sets an end of run flag. Once the currently downloading image completes, successfully or not, the handler sees the flag, and terminates the run at that point. There are two benefits to this. The first is that this bug is fixed. The second is that we can now reduce the overall timeout since we are guaranteed to have at least one image load. So, the test should now complete faster.
3. A third minor change I made was in the timeout values for each image. I've increased it a little for the small images so that the test still works on really slow connections -- like AT&T's 2G network, which gives me about 30-40kbps
All together, this should provide a more reliable test for everyone, and a test that actually works on mobile phones.

Thanks

I'd like to thank all my friends who tested this with their iPhones - that's where the timeout+parallel downloads bug was most visible, and there is no way I'd have fixed it without your help. Stay tuned for more posts on what I've learnt from it all.

So, go get the code, run your own tests, read my source code, challenge my ideas. If you think something isn't done correctly, let me know, or even better, send in a patch. The code is on github.

Short URL: http://tr.im/jsbwtest11

2 comments :

semanticvoid

Since the script picks the three largest images for BW calculation in every run, for high speed connection we see a low error (like 9% on my 6Mbps) as these image sizes provide a good estimate of the BW. But the same test run via mobile gives a higher error (> 40%) due to two things I see:
- the test times out on image_0 for run 3.
- the size of image 0 it seems is not enough to get a good BW number (giving a fairly lower BW number as compared to other images). This skews the distribution and hence the error.
Eg. the 7 readings from my iphone:
Run 1: 33586, 56660, 38591
Run 2: 8343, 47367, 52435
Run 3: 8697

As seen the 8K values are cause of the large error.

Philip

what puzzles me right now is why the first image on runs 2 & 3 takes so long to load. the other images all seem to be fine.

...===...