[philiptellis] /bb|[^b]{2}/
Never stop Grokking


Friday, January 29, 2010

Pretty print cron emails

We have a lot of scripts that run through cron and the output gets emailed by cron to whoever owns the cron job, or to a mailing list or something. Unfortunately, the headers for these mails are ugly and could use a lot of work. The primary problem I had with these mails was that the subject line could not be customised. Other smaller problems were that the From: address and Return-Path didn't have good values.

A sample header looks like this:
From: Cron Daemon <root@hostname.domain>
To: someone@somewhere.com
Subject: Cron <root@hostname> /usr/local/bin/foobar --whatever
Of the above, only the To: address makes any sense. The first thing I tried to change was the subject line. This is fairly easy using the mail program:
10 15 * * * /usr/local/bin/foobar --whatever | mail -s "Pretty subject" someone@somewhere.com
The mail now looks like this:
From: Cron Daemon <root@hostname.domain>
To: someone@somewhere.com
Subject: Pretty subject
This is a big step forward, because we can now filter emails based on subject. It still doesn't let us change the from address though. That's when I remembered using sendmail a long time ago to send mail through perl scripts, so I had a look through the sendmail man page, and found a few interesting flags:
-F full_name
       Set  the  sender full name. This is used only with messages that
       have no From: message header.

-f sender
       Set the envelope sender  address.  This  is  the  address  where
       delivery  problems  are  sent to, unless the message contains an
       Errors-To: message header.
This lets you set the From header, but there's no way to change the Subject. Since we can't use mail and sendmail together, we can either set the From address or the Subject, but not both. I then remembered the -t flag to sendmail that tells it to read the header from the message itself. The only thing left to do was to add the header into the output of every command. Since I couldn't easily do this in cron, I wrote my own script to do it. It's called pmail. The p is for pretty.

Get the script now

The meat of the script is this one line:
( echo -e "From: $from_name <$from>\r\nSubject: $subject\r\nTo: $to\r\n\r\n"; cat ) | /usr/sbin/sendmail -t
It writes the From and Subject headers, then leaves a blank line, and then passes its STDIN directly on to sendmail. The script has 3 optional parameters — the sender's address, sender's name and subject, and one mandatory parameter — the recipient's address(es).

To use it, set your cron line something like this:
* * * * * /usr/local/bin/foobar --whatever | \
   pmail -F "Philip Tellis" -f "philip@bluesmoon.info" -s "Pretty mail from foobar" someone@somewhere.com
Note: lines wrapped for readability.

You can also add the current date to the subject line like this:
... -s "Foobar mail on $( date +\%Y-\%m-\%d )" ...
Remember that you have to escape % signs in crontab lines otherwise cron will translate them to newline characters.

Umm, you also need to have sendmail set up correctly, and doing that is your problem :)

It's not complete yet, but it works for my use case. If you like it, use it. It's under the BSD license. I'll probably add a Reply-To option if it makes sense for me, and at some point add in data validation too, but not today.

Enjoy.

Tuesday, January 26, 2010

Speaking at FOSDEM 2010

I'll be speaking at FOSDEM 2010 in Brussels, Belgium on the 6th and 7th of February. I'll be speaking about the YUI Flot library. Registration is free, so if you're in the area, just show up. If anyone's interested in a performance BoF, let's do that as well.

I'm speaking at FOSDEM, the Free and Open Source Software Developers' European Meeting

Monday, January 25, 2010

Speaking at ConFoo

I'll be speaking at ConFoo in Montreal this March. I have two talks on performance and scaling. If you're in the area, drop in. The conference line-up is great, and if PHP Quebec last year was any indication, it should be very enlightening.

confoo.ca Web Techno Conference

Tuesday, January 19, 2010

Bandwidth test v1.2

After testing this out for about a week, I'm ready to release v1.2 of the bandwidth testing code.

Get the ZIP file or Try it online

Changes

The changes in this release are all statistical, and related to the data I've collected while running the test.
  1. Switch from the geometric mean to the arithmetic mean, because data for a single user is more or less centrally clustered.

    This is not true across users, but for readings for a single user (single connection actually), I've found that the data does not really have outliers.
  2. Instead of downloading all images on every run, use the first run as a pilot test, and based on the results from that run, only download the 3 largest images that can be downloaded for this user.

    This allows us to download fewer bytes, and get more consistent results across runs.
  3. Add random sampling. In previous versions, the test would be run for every page view. By adding random sampling, we now only run it on a percentage of page views.

    You can now set PERFORMANCE.BWTest.sample to a number between 0 and 100, and that's the percentage of page views that will be tested for bandwidth. Note that this is not an integer, so you can set it to 0.1 or 10.25, or anything as long as it's between 0 and 100 (both inclusive). Setting it to 0 means that the test isn't run for anyone, so you probably don't want that, and 100 means that the test is run for everybody. This is also the default.
  4. Fire an onload event when the script completes loading. This is mainly so that you can asynchronously load the script so that it does not affect your page load time. Instead of checking for script.onload or onreadystatechange, you can just implement the PERFORMANCE.BWTest.onload method.
You can see the entire history on github.

Sunday, January 17, 2010

The statistics of performance measurement - Random Sampling

I've been working with performance data for over two years now, and over that time I've picked up a little bit of statistics. I am by no means an expert in the subject and I'd probably have flunked statistics in college if it wasn't for a classmate ramming the entire 3Kg statistics book into my head the week before our exams. One thing I understand now though, is that statistics is an extremely important skill for any professional in any field.

Anyway, with a few refreshers from wikipedia and many other texts and papers and a bunch of experts at Yahoo!, I think I've learnt a bit. I'll put it down here mainly so I don't forget.

Why statistics?

So we're often asked by management for a number that represents the performance of a web page. The big question is, what should that number be? Is it the YSlow score, the network roundtrip time, page load time, a combination of these, or some other measure that shows how much the user likes or dislikes the page? Once we know which number we want to measure, how do we determine how this affects all our users? The easiest way is to actually measure what every user sees, but then we end up with thousands, possibly even millions of measurements for different users, and not all of them are the same.

This is where statistics comes in, and where I realised how little I knew.

There are a whole bunch of terms that come up. The few that came up are: Random Sample, Central Tendency, Mean, Median, Geometric Mean, Normal distribution, Log-normal distribution, Outliers, IQR filtering, Standard Error, Confidence Interval. There are probably more that will come up in the course of the next few posts, but this what I can remember for now.

Random Samples

Now say the app whose performance we want to measure typically has about 1 million users a day. If we want to measure our users' perceived performance, we could measure the performance for all interactions of all 1 million users, but this is overkill. In reality, you only need to know the performance for maybe 10% of your users, and just project from there.

Now how we select these 10% is important. If we select the fastest 10%, then our numbers will appear better than they actually are, and if we select the slowest 10%, the opposite happens. There are, in fact, many variables on the users end that affect performance, and are beyond our control. These are called confounding factors and are covered in this article by Zed Shaw under the section on Confounding. Since we can't control these variables, the best thing we can do is completely randomize our sample, which reduces the effect that these variables have over a large sample.

Some important things about a random sample are:
  • It should not be deterministic, i.e., I shouldn't be able to predict who will be chosen using a formula
  • It shouldn't be repetitive, i.e., if I run my randomiser multiple times, I should not end up with identical sets
  • All members of the population should have an equal probability of ending up in the selected sample
The wikipedia articles on random sampling may not be up to the mark, but this article in the social research methods knowledge base is pretty good. There's also an excellent video on TED by Oxford Mathematician Paul Donnelly on how stats fool juries. The real stats is at 3:30 and random sampling comes in at 11:00.

So, how do we go about getting a random sample of all our users or page views? On the face of it, it's really quite easy. I'll only consider page views here because it means I don't have to bother with maintaining sessions. Here's the simple algorithm:
  1. N is your population size — that's the total number of page views that you get in, say, one day
  2. f is the fraction of your population that you want in your sample.
  3. Use something like simple random sampling to pull out N*f items from your population — this is n
  4. Measure the performance of these n items
In PHP, you'd do something like this:
$N_a = array(...);                           // array of entire population
$f = 0.10;                                   // 10%
$n_a = array_rand($N, intval(count($N)*$f)); // $n now contains your sample
Other languages are similar. This is easy to do if you already have access to your entire population. For example, at the end of a day, you can pull out requests from your HTTP access logs. The problem, however, is that you'd need to first measure the performance of all page views, and then randomly pick 10% of them. You'll be throwing away a lot of data, and all your users have to pay the price of the performance test which could be non-negligible. (Heisenberg's uncertainty principle sort of plays a part here — you can't measure something without changing what you're measuring.)

A better solution would be to only instrument your code with performance measurement hooks if the particular request falls into your sample. This makes the whole thing a little more complicated because we do not have access to the entire population when we need to make this decision. What we do have though, is an estimate (based on past experience) of the total number of page views (N) and the sampling fraction (f) based on how big a sample we want. Can we use these two to figure out if a particular request should fall into the sample or not? A simple approach is something like this (PHP):
// Using systematic random sampling
// $N == estimated population size, $n == desired sample size
// $k == floor($N/$n)

$counter = cache_fetch('counter');
if(is_null($counter)) {     // first hit
    $seed = mt_rand(1, $k);
    $counter = $k-$seed;
}

$counter++;

if($counter === $k) {
    measure_performance(); // instrument with performance code
    $counter = 1;
}

cache_store('counter', $counter);
The $counter variable needs to persist across requests, which is why we need to store it in a cross-request cache. Unfortunately, the obvious problem here is that if we have concurrent connections (which any web server has), then we end up with a race condition where multiple processes all read the same value from cache, increment it independently, and then write the new value back, ie, multiple requests get counted as a single request. To fix this, we'd need to use a mutex to serialize that section of code, but that will throw performance and scalability out of the window. A better solution is required.

A more naïve approach might work:
// $N == estimated population size, $n == desired sample size
// $f == $n/$N;

if(mt_rand() < $f * mt_getrandmax()) {
    measure_performance(); // instrument with performance code
}
This approach does not require global variables shared across processes, and is actually really fast, the question is, is it correct. Let's look at an example that's easier to imagine... Let's say we have a box with 20 marbles in it — identical in all respects except that they are numbered from 1 to 20. We need to pick 5 marbles out of this box at random. The probability of a particular marble being picked depends on whether we pick all five at the same time or one at a time. If we pick them all together, the probability of a particular marble being selected is 5/20 == 1/4 == 25%.

Instead, if we pick them one at a time, then the probability depends on when a marble is picked. Each marble has a 1/20 chance of being the first, a 1/19 chance of being the second, 1/18 chance of being the third and so on. Therefore, the probability of a particular marble being selected is now (1/20 + 1/19 + 1/18 + 1/17 + 1/16) == 27.95%

A third possibility is that we look at each marble in sequence, and then decide whether to select it or throw it away. With this approach, the probability of the first marble being selected is 5/20 == 25%. The probability of the second marble being selected though, depends on whether the first was selected. If the first marble was selected (25%), then we only need 4 more marbles (4/19), but if the first marble was thrown away (75%), we still need 5 marbles (5/19). Since at this point we already know whether the first marble was selected or not, that event is no longer uncertain, and we need to pick one of the two probabilities for the second marble.

What we're doing in the code above, however, is that for each marble that we see, there's a 25% chance that we'll pick it and a 75% chance that we'll discard it, however, this has no effect on the probability of selection of the next marble. There is still a 25% chance that it will be selected regardless of whether the first marble was selected or not. In order for the selection of previous marbles to be considered, we'd need to maintain a count of all selected marbles — which is the problem we're trying to avoid here.

The outcome here is that we may not always end up with exactly $n elements. In fact, it's possible that we'll end up with no elements or all elements, though on average, we should end up with close to $n elements.

All things considered, I think it's a close enough approximation without degrading performance.

What about users?

So what about measuring users instead of page views? All of the above looked only at picking a random sample of page views. In order to pick a random sample of users instead, you'll need a unique identifier for each user. This is typically a cookie. You'll want the cookie to last for a reasonably long time. Now, since you know how many cookies you've issued over a given period of time, and since you know the validity of a cookie, you can determine on a given date how many cookies are currently valid, and what they are — this is your population (N). You can then use the simple random sampling method (array_rand()) to pull out a random sample (n) of cookies that you want to measure. Once you have this, it's a simple case of matching the user's cookie to your sample list, and if it matches, send across the measurement code. This is much easier and more correct than what we've seen above, but it has a few caveats:
  • It won't work with users who have cookies disabled
  • There's no guarantee that a user who has fallen into your bucket will ever return to your site
  • New users, who don't already have a cookie won't get tested
  • If you have multiple web servers in a farm, you need to make sure all of them know the sample list
You'll also need to tweak your cookie lifetime based on how often you get new users and what percentage of your users return to the site. If your site requires users to log in, this might be much easier. A combination of the two approaches may also work well.

Closing thoughts

I don't know everything, and wrt statistics, I know very little. There's a good chance that I've gotten things wrong here, so feel free to leave a comment correcting me and chiding me for spreading false information. It will help me learn, and I can update this post accordingly.

Random sampling is pretty important when it comes to getting a good estimate of a measure for a large population. It's applicable in many fields, though I currently only care about its application to performance analysis. I'll post again about the other things I've learnt.

Saturday, January 16, 2010

Notes on the Performance BoF at FOSS.IN

Ok, so it's been over a month and I'm nowhere closer to writing up a proper summary of the performance BoF we had at FOSS.IN, so I'm just going to post my notes as-is:

Attendees:

Philip - Front end performance
Devdas - Throughput, latency for system
Anand - All performance
g0sub - Web applications
Anant - Firefox performance, and mobile devices
Vinayak - System & Web app performance
Tarique - Web server & database performance

Thoughts

  • fast code v/s maintainable code
  • should we look at code or configuration or something else?
  • design for real users, not a few test users

Things to work on:

  1. Configuration
  2. Management/administrative code
  3. Hardware
  4. Instrument code for monitoring
  5. trace apps even if you don't have the code by looking at sar and ltrace
  6. Cheat

How to handle performance of streaming data?

Database performance - start with a denormalized db and then figure out what to normalize later

File format optimisation for performance

Sunday, January 10, 2010

Bandwidth test v1.1

I've just bumped the version number on the bandwidth test to 1.1. There were two major changes that I'll describe below.

Get the ZIP file or Try it online

Changes

The changes in this release were both detected when testing via mobile phones, but they should improve the test's reliability for users of all (javascript enabled) browsers.
  1. I noticed that on the Nokia E71, the latency test wasn't running. After much debugging (this browser doesn't have a firebug equivalent), it turned out that the browser will not fire any events on an image object if the HTTP response's content length is 0.

    Since I was using a 0 byte file named image-l.png, this file's content-type was set to image/png, but its content-length was 0. Most browsers fired the onerror event when this happened, but Nokia's browser which is based on AppleWebKit/413 fires nothing. I then changed the image to return a 204 No Content HTTP response code, but I had the same problem. The only solution was to send some content. After playing around with several formats, I found that GIF could generate the smallest image of all at 35 bytes, so I used that. I haven't noticed any change in latency results on my desktop browser after the change, so I think it should be okay.

    This also means that browsers will now fire the onload event instead of onerror, so I changed that code as well.
  2. The second change fixes a bug in the code related to timed out images. This is what was happening.

    A browser downloads images of progressively larger size until it hits an image that takes more than 3 seconds to download. When this happens, it aborts the run, and moves on to the next run, or to the latency test.

    The bug was that even though the javascript thought it was aborting the run, the browser did not stop the download, even after setting the image object to null. As a result of this, the next run, or the latency check was running in parallel with this big download. In some cases, this meant that the next run would be slower than it should have been, but in other cases, it meant that the images for the next run would block and not be downloaded until this big image completed. The result of this was that those other images would also time out, making the problem worse.

    For the latency check, this meant that the image would never download, and that's why latency would show up as NaN -- I was trying to get the median of an empty array.

    I fixed this by changing the timeout logic a bit. Now a timeout does not abort the run, it only sets an end of run flag. Once the currently downloading image completes, successfully or not, the handler sees the flag, and terminates the run at that point. There are two benefits to this. The first is that this bug is fixed. The second is that we can now reduce the overall timeout since we are guaranteed to have at least one image load. So, the test should now complete faster.
  3. A third minor change I made was in the timeout values for each image. I've increased it a little for the small images so that the test still works on really slow connections -- like AT&T's 2G network, which gives me about 30-40kbps
All together, this should provide a more reliable test for everyone, and a test that actually works on mobile phones.

Thanks

I'd like to thank all my friends who tested this with their iPhones - that's where the timeout+parallel downloads bug was most visible, and there is no way I'd have fixed it without your help. Stay tuned for more posts on what I've learnt from it all.

So, go get the code, run your own tests, read my source code, challenge my ideas. If you think something isn't done correctly, let me know, or even better, send in a patch. The code is on github.

Short URL: http://tr.im/jsbwtest11

Thursday, January 07, 2010

Handling document.write in dynamic script nodes

As a performance junkie, I generally want my pages to be as fast as I can make them. When I control the entire page, that typically means going all out. As a web user, I get annoyed when the little spinner on my browser keeps spinning even though I know that all the essential content on my page has loaded. When I redesigned my website late last year, I decided to address this issue.

Now if you take a look at my homepage, you'll notice a bunch of external resources:
  1. My Yahoo! Avatar
  2. My twitter feed
  3. My delicious bookmarks
  4. My upcoming events
  5. My dopplr badge
  6. My flickr photos
That's resources from six services whose performance I cannot control, nor rely upon. They're also six components on my page that aren't critical to the content of my page, and if one of them were unavailable for some amount of time, that wouldn't really hurt the purpose of the page.

Now I've been working with dynamic script nodes for a very long time to do background loading of scripts, but in all those cases, those scripts played nicely with other things on the page, and had JSONP responses. Not all the resources that I use now have this behaviour, so I had to come up with something else. Let's go through them one at a time.

To start with, I just included the javascript that all these services told me to include. Since the Avatar is just an image, I just used an img tag and left it at that. I've also never seen any performance issues with my Y! Avatar. The other services, however, all went down at some point or the other, and all had to be included as javascript.

Twitter

I started with the twitter widgets page. I copied the code, and pasted it where I wanted the twitter widget to show up. It's a lot of code, but that was okay to start with:
<script src="http://widgets.twimg.com/j/2/widget.js"></script>
<script>
new TWTR.Widget({
  version: 2,
  type: 'profile',
  rpp: 4,
  interval: 6000,
  width: 250,
  height: 300,
  theme: {
    shell: {
      background: '#333333',
      color: '#ffffff'
    },
    tweets: {
      background: '#000000',
      color: '#ffffff',
      links: '#4aed05'
    }
  },
  features: {
    scrollbar: false,
    loop: false,
    live: false,
    hashtags: true,
    timestamp: true,
    avatars: false,
    behavior: 'all'
  }
}).render().setUser('bluesmoon').start();
</script>
I then had to figure out if I could easily move the code to the bottom of my document so that it didn't block the rest of my page's load. twitter tends to go down more often than any of the other services.

I read through the source code for widget.js and found out that it creates a DIV into which it writes itself, however, you can create the DIV yourself, and pass its id to the widget constructor. The new code becomes:
<script src="http://widgets.twimg.com/j/2/widget.js"></script>
<script>
new TWTR.Widget({
  version: 2,
  type: 'profile',
  rpp: 4,
  id: 'twitter_widget',
  interval: 6000,
  width: 250,
  height: 300,
  theme: {
    shell: {
      background: '#333333',
      color: '#ffffff'
    },
    tweets: {
      background: '#000000',
      color: '#ffffff',
      links: '#4aed05'
    }
  },
  features: {
    scrollbar: false,
    loop: false,
    live: false,
    hashtags: true,
    timestamp: true,
    avatars: false,
    behavior: 'all'
  }
}).render().setUser('bluesmoon').start();
</script>
I could then create a DIV with an id of twitter_widget where I wanted the widget to go, and push the twitter code to the bottom of my page. This worked well. Kudos to Dustin Diaz for building a flexible widget, but really, you need to make those API docs available somewhere in that widget.

Anyway, we'll get back to twitter later, let's move on.

delicious

Finding the delicious badge was the toughest part. It's hidden on the help page under tools. Anyway, if you don't want to search for it, this is the link for delicious linkrolls.

After configuring the widget, I ended up with this javascript:
(I've split it onto multiple lines for readability)
<script type="text/javascript"
    src="http://feeds.delicious.com/v2/js/bluesmoon?
         title=My%20Delicious%20Bookmarks
         &icon=s
         &count=5
         &bullet=%E2%80%A2
         &sort=date
         &name
         &showadd"></script>
This code is not very nice, because I can't just move it elsewhere in the document. Looking at the page source, it seems that it initialises a Delicious namespace, and then loads two other javascript files. The first handles rendering of the linkroll, and the second is a JSONP feed of my links. Unfortunately, the first script chooses to write to the document using the document.write() javascript function.

Note that this function is the primary reason for javascript blocking page rendering -- it can modify the structure of the page as it loads. I decided to tackle this later.

Upcoming

Upcoming's badges are linked to from the page footer, so it was easy to start with this. I got the code, but decided to style it myself. The code points to the following javascript file (again wrapped for readability):
http://badge.upcoming.yahoo.com/v1/?
    badge_type=user
    &badge_size=sm
    &badge_layout=v
    &badge_styling=2
    &badge_no_background=
    &badge_venue=1
    &date_format_type=us_med
    &id=54783
The source of this badge shows that it also uses document.write() to write itself into the document. Solving this problem would tackle upcoming and delicious as well.

Dopplr

Dopplr was next, and was by far the easiest to work with. The account section points to a Blog badge which gives you a bunch of javascript to include onto your page among other things. The javascript link for my account is:
http://www.dopplr.com/blogbadge/script/6d1f4effa8fc5ac6db60160860ece8be?
    div-id=dopplr-blog-badge-for-bluesmoon
And the source code of that script had a lot of comments saying exactly what you can do with it. Brilliant. I just created a DIV with an id of my choice, and pushed this script to the bottom of the page.

Flickr

After the Avatar, delicious and upcoming, Flickr was the fourth Yahoo! property on my page. The previous two had already proved bad players, so my hopes weren't too high. Still, flickr has been good in the past, so I looked into it. The flickr badge page has a wizard to create the badge for you. This got me the following javascript:
http://www.flickr.com/badge_code_v2.gne?
    count=10
    &display=random
    &size=s
    &layout=x
    &source=user
    &user=57155801%40N00
Looking at the source of that showed the same old problem. document.write()

It was time to tackle this beast.

document.write()

To handle document.write(), I had to redefine what it did so that it would work asynchronously even after the entire page had loaded. I came up with this javascript:
document.writeln = document.write = function(s) {
 var id='';
 if(s.match(/\bupcoming_badge/)) {
  id='upb_events';
 }
 else if(s.match(/\bflickr_badge_image\b/)) {
  id='flickr_badge_wrapper';
 }
 else if(s.match(/\bdelicious\b/)) {
  id='delicious_widget';
 }
 else {
  id='overflow_div';
 }

 document.getElementById(id).innerHTML = s;
 return true;
};
It checks the content of the HTML to be written, and if it matches one of the three badges that I expect, I write it into the innerHTML of a DIV that I've already created for that badge. If it's something I don't recognise, then I assume that it doesn't matter where on the page it shows up, so just write it to an off-screen DIV.

I also had to make sure that there was no javascript being written out -- which is what the delicious badge was doing. In that case, I included the resulting javascript instead of including the javascript that the badge gave me.

Asynchronous loading

This worked, except that badges were still loaded in the order that I included the javascript, and if one blocked, all the others would wait, so I needed to make things asynchronous. This was easily accomplished using dynamic script nodes and handling the onreadystatechange event.

With this solved, I decided to parallelise downloads by moving all the code back into the head of the document. That way the script nodes would download in parallel with the document. Unfortunately, that also meant that some scripts might load up before the DIVs they need are available. Only dopplr handled this case well. For all the others, I had to handle it.

I ended up changing the write function above to defer until the required DIV was available.

Rather than include the entire Javascript here, I'll point you to the source file that you can see for yourself. It's not commented, but it's fairly small, so if you already know javascript, it should be easy to read.

With that, I had five badges on my page, all loaded asynchronously, and in parallel with each other and the page itself. My entire page is now reduced to four synchronous components: The HTML, CSS, Javascript and Avatar image (and possibly the favicon). Everything else is loaded asynchronously in the background and affects neither the firing of my onload handler, nor the browser's spinner.

You can see it in action on bluesmoon.info. Go ahead and view source.

Short URL: http://tr.im/docwritedynscr

Saturday, January 02, 2010

Run your own bandwidth test

Happy 2010 to all my readers. A new post to start off this year. Back in November, I wrote about bandwidth testing through javascript. I've just created a github project for it, and am making the code available for anyone to download and use on their own servers. Use this if you want to know what your site's users' bandwidth is, and perhaps customise the experience for different bandwidths.

Get the ZIP file
(updated to 1.1)

The zip file contains:
  • bw-test.js: the javascript file you need
  • image-*.png: bandwidth testing images
  • README: brief instructions
  • tests/*.html: simple tests that show you how to use the code
Unzip the file into a directory in your web root. You really only need the javascript and the images, so you can get rid of the rest. The javascript does not need to be in the same directory as the images, just make sure the base_url variable points to the URL where the images are. This means that you can offload the javascript onto a CDN and keep the images on your own server. You shouldn't push the images to a CDN, because that will measure your user's effective bandwidth when accessing the CDN and not your server, which is presumably what you'd like to know. You'd probably also want to minify the javascript before using it. I'll provide a minified version in a later release.

The test runs automatically once the code is included, so to avoid interfering with the download of your page's components, make sure it's the last thing on your page.

If you want to get a little more adventurous, you could set the auto_run variable to false, and then start the test whenever you're ready to run it by calling PERFORMANCE.BWTest.run().

Once the test completes, it will fire the PERFORMANCE.BWTest.oncomplete event. You can attach your own function to this event to do what you want with the results. You can also beacon back the results to your server by setting the beacon_url variable. This URL will be called with the following URL parameters:
  • bw: The median bandwidth in bytes/second
  • latency: The median HTTP latency in milliseconds
  • bwg: The geometric mean of bandwidth measurements in bytes/second
  • latencyg: The geometric mean of HTTP latency measurements in milliseconds
Your script that handles this URL may store these details in a database keyed on the user's IP or at least some part of it, and perhaps set a cookie to avoid running the test if you already know their bandwidth.

The code is distributed under the BSD license, so use it as you like.

Note that all variables mentioned above are in the PERFORMANCE.BWTest namespace.

...===...