The other side of the moon: August 2010

Note that I'm using MathJax to render the equations on this post. It can take a while to render everything, so you may need to wait a bit before everything shows up. If you're reading this in a feed reader, then it will all look like gibberish (or LaTeX if you can tell the difference).

So I've been playing with this idea for a while now, and bounced it off YSlow dev Antonia Kwok a few times, and we came up with something that might work. The question was whether we could estimate a page's expected roundtrip time for a particular user just by looking at the page structure. The answer, is much longer than that, but tends towards possibly.

Let's break the problem down first. There are two large unknowns in there:

a particular user
the page's structure

These can be broken down into their atomic components, each of which is either known or measurable:

Network characteristics:
- Bandwidth to the origin server ( \(B_O\) )
- Bandwidth to your CDN ( \(B_C\) )
- Latency to the origin server ( \(L_O\) )
- Latency to your CDN ( \(L_C\) )
- DNS latency to their local DNS server ( \(L_D\) )
Browser characteristics:
- Number of parallel connections to a host ( \(N_{Hmax}\) )
- Number of parallel connections overall ( \(N_{max}\) )
- Number of DNS lookups it can do in parallel ( \(N_{Dmax}\) )
- Ability to download scripts in parallel
- Ability to download css in parallel (with each other and with scripts)
- Ability to download images in parallel with scripts
Page characteristics:
- Document size (\(S_O\) )
- Size of each script (\(S_{S_i}\))
- Size of each non-script resource (images, css, etc.) (\(S_{R_i}\))
- Number of scripts ( \(N_S\))
- Number of non-script resources (\(N_R\))
- Number of hostnames (\(N_H\)), further broken down into:
  - Number of script hostnames (\(N_{SH}\))
  - Number of non-script hostnames (\(N_{RH}\))

All sizes are on the wire, so if a resource is sent across compressed, we consider the compressed size and not the uncompressed size. Additionally, scripts and resources within the page can be combined into groups based on the parallelisation factor of the browser in question. We use the terms \(SG_i\) and \(RG_i\) to identiffy these groups. We treat scripts and non-script resources differently because browsers treat them differently, ie, some browsers will not download scripts in parallel even if they download other resources in parallel.

To simplify the equation a bit, we assume that bandwidth and network latency from the user to the CDN and the origin are the same. Additionally, the latency for the main page includes both network latency and the time it takes the server to generate the page (\(L_S\)). Often this time can be significant, so we redefine the terms slightly:
\begin{align}
B_O & = B_C \\
L_O & = L_S + L_C
\end{align}

Browser characteristics are easy enough to obtain. Simply pull the data from BrowserScope's Network tab. It contains almost all the information we need. The only parameter not listed is the number of parallel DNS lookups that a browser can make. Since it's better to err on the side of caution, we assume that this number is 1, so for all further equations, assume \(N_{Dmax} = 1\).

Before I get to the equation, I should mention a few caveats. It's fairly naïve, assuming that all resources that can be downloaded in parallel will be downloaded in parallel, that there's no blank time between downloads, and that the measured bandwidth \(B_C\) is less than the actual channel capacity, therefore multiple parallel TCP connections will all have access to the full bandwidth. This is not entirely untrue for high bandwidth users, but it does breakdown when we get down to dial-up speeds. Here's the equation:
\[
T_{RT} = T_P + T_D + T_S + T_R
\]
Where:
\begin{align}
T_P \quad & = \quad L_O + \frac{S_O}{B_C}\\
\\
\\
\\
\\
\\
T_D \quad & = \quad \frac{N_H}{N_{Dmax}} \times L_D\\
\\
\\
\\
\\
\\
T_S \quad & = \quad \sum_{i=1}^{N_{SG}} \left( \frac{S_{SG_imax}}{B_C} + L_C \right) \\
\\
\\
\\
\\
\\
N_{SG} \quad & = \quad \left\{
\begin{array}{1 1}
\frac{N_S}{min \left( N_{Hmax} \times N_{SH}, N_{max} \right) } & \quad \text{if browser supports parallel scripts}\\
\\
\\
N_S & \quad \text{if browser does not support parallel scripts}\\
\end{array} \right. \\
\\
\\
\\
\\
\\
S_{SG_imax} \quad & = \quad \text{Size of the largest script in script group } SG_i\\
\\
\\
\\
\\
\\
T_R \quad & = \quad \sum_{i=1}^{N_{RG}} \left( \frac{S_{RG_imax}}{B_C} + L_C \right) \\
\\
\\
\\
\\
\\
N_{RG} \quad & = \quad \frac{N_R}{min \left( N_{Hmax} \times N_{RH}, N_{max} \right) }\\
\\
\\
\\
\\
\\
S_{RG_imax} \quad & = \quad \text{Size of the largest resource in resource group } RG_i
\end{align}

So this is how it works...

We assume that the main page's download time is a linear function of its size, bandwidth, the time it takes for the server to build the page and the network latency between the user and the server. While this is not correct (consider multiple flushes, bursty networks, and other factors), it is close.

We then consider all scripts in groups based on whether the browser can handle parallel script downloads or not. Script groups are populated based on the following algorithm:

for each script:
   if size of group > N_max:
      process and empty group
   else if number of scripts in group for a given host > N_Hmax:
      ignore script for the current group, reconsider for next group
   else
      add script to group

process and empty group

If a browser cannot handle parallel scripts, then we just temporarily set \(N_{max}\) to 1.

Similarly, we consider the case for all non-script resources:

for each resource:
   if size of group > N_max:
      process and empty group
   else if number of resources in group for a given host > N_Hmax:
      ignore resource for the current group, reconsider for next group
   else
      add resource to group

process and empty group

For DNS, we assume that all DNS lookups are done sequentially. This makes our equation fairly simple, but turns our result into an overestimate.

Overall, this gives us a fairly good guess at what the roundtrip time for the page would be, but it only works well for high bandwidth values.

We go wrong with our assumptions at a few places. For example, we don't consider the fact that resources may download in parallel with the page itself, or that when the smallest script/resource in a group has been downloaded, the browser can start downloading the next script/resource. We ignore the fact that some browsers can download scripts and resources in parallel, and we assume that the browser takes no time to actually execute scripts and render the page. These assumptions introduce an error into our calculations, however, we can overcome them in the lab. Since the primary purpose of this experiment is to determine the roundtrip time of a page without actually pushing it out to users, this isn't a bad thing.

So, where do we get our numbers from?

All browser characteristics come from BrowserScope.

The user's bandwidth is variable, so we leave that as a variable to be filled in by the developer running the test. We could simply select 5 or 6 bandwidth values that best represent our users based on the numbers we get from boomerang. Again, since this equation breaks down at low bandwidth values, we could simply ignore those.

The latency to our CDN is something we can either pull out of data that we've already gathered from boomerang, or something we can calculate with a simple and not terribly incorrect formula:
\[
L_C = 4 \times \frac{distance\left(U \leftrightarrow C\right)}{c_{fiber}}
\]
Where \(c_{fiber}\) is the speed of light in fiber, which is approximately \(2 \times 10^8 m/s\).

DNS latency is a tough number, but since most people are fairly close to their ISPs, we can assume that this number is between 40-80ms. The worst case is much higher than that, but on average, this should be correct.

The last number we need is \(L_S\), the time it takes for the server to generate the page. This is something that we can determine just by hitting our server from a nearby box, which is pretty much what we do during development. This brings us to the tool we use to do all the calculations.

YSlow already analyses a page's structure and looks at the time it takes to download each resource. We just pull the time out from what YSlow already has. YSlow also knows the size of all resources (both compressed and uncompressed), how many domains are in use and more. By sticking these calculations into YSlow, we could get a number that a developer can use during page development.

The number may not be spot on with what real users experience, but a developer should be able to compare two page designs and determine which of these will perform better even if they get the same YSlow score.

Naturally this isn't the end of the story. We've been going back and forth on this some more, and are tending towards more of a CPM approach to the problem. I'll write more about that when we've sorted it out.

For now, leave a comment letting me know what you think. Am I way off? Do I have the right idea? Can this be improved upon? Is this something you'd like to see in YSlow?

Ten months down the line,
I still like this site's style,
I've tweaked it a bit,
But I gotta admit,
That bluesmoon did this one fine.

When I work on a site, I always start with the design first. I'm a visual developer, so I really need to see what things will look like when I'm done. I also need to know if I can stand the design enough to stare at it for days on end during development. The result is that I end up hating my design in a couple of days and change things around, most often it's the colours.

This has happened a lot with earlier versions of my site's style, but the version I have now is one I like. I first pushed it out standardised across all my subdomains back in November 2009, and now 10 months down the line, it still pleases me.

I like the colours. They don't shout out at me, but frame my content well. Important content is visible while unimportant meta information is dimmed out. I played around with the margins a lot to get it to work the way I want across devices. It's not ideal, but it gets the job done. I took inspiration for the fonts from several sites until I settled on what I use now.

There's still a few issues. Not everyone likes the tag cloud, but it's the best way for me to navigate my site, especially since google's site search doesn't seem to work. The home page looks terrible on my E71, and I should get to fixing that some day, but other than that, it's ok.

The other side of the moon

Wednesday, August 25, 2010

An equation to predict a page's roundtrip time

Thursday, August 19, 2010

Where does firefox install its extensions?

Linux

MacOSX

Wednesday, August 04, 2010

Site theme

Monday, August 02, 2010

4.01 Strict — Tennis Elbow

Labels

Translate this page

Blog Archive