Note that I'm using MathJax to render the equations on this post. It can take a while to render everything, so you may need to wait a bit before everything shows up. If you're reading this in a feed reader, then it will all look like gibberish (or LaTeX if you can tell the difference).

So I've been playing with this idea for a while now, and bounced it off YSlow dev Antonia Kwok a few times, and we came up with something that might work. The question was whether we could estimate a page's expected roundtrip time for a particular user just by looking at the page structure. The answer, is much longer than that, but tends towards

Let's break the problem down first. There are two large unknowns in there:

To simplify the equation a bit, we assume that bandwidth and network latency from the user to the CDN and the origin are the same. Additionally, the latency for the main page includes both network latency and the time it takes the server to generate the page (\(L_S\)). Often this time can be significant, so we redefine the terms slightly:

\begin{align}

B_O & = B_C \\

L_O & = L_S + L_C

\end{align}

Browser characteristics are easy enough to obtain. Simply pull the data from BrowserScope's Network tab. It contains almost all the information we need. The only parameter not listed is the number of parallel DNS lookups that a browser can make. Since it's better to err on the side of caution, we assume that this number is 1, so for all further equations, assume \(N_{Dmax} = 1\).

Before I get to the equation, I should mention a few caveats. It's fairly naïve, assuming that all resources that can be downloaded in parallel will be downloaded in parallel, that there's no blank time between downloads, and that the measured bandwidth \(B_C\) is less than the actual channel capacity, therefore multiple parallel TCP connections will all have access to the full bandwidth. This is not entirely untrue for high bandwidth users, but it does breakdown when we get down to dial-up speeds. Here's the equation:

\[

T_{RT} = T_P + T_D + T_S + T_R

\]

Where:

\begin{align}

T_P \quad & = \quad L_O + \frac{S_O}{B_C}\\

\\

\\

\\

\\

\\

T_D \quad & = \quad \frac{N_H}{N_{Dmax}} \times L_D\\

\\

\\

\\

\\

\\

T_S \quad & = \quad \sum_{i=1}^{N_{SG}} \left( \frac{S_{SG_imax}}{B_C} + L_C \right) \\

\\

\\

\\

\\

\\

N_{SG} \quad & = \quad \left\{

\begin{array}{1 1}

\frac{N_S}{min \left( N_{Hmax} \times N_{SH}, N_{max} \right) } & \quad \text{if browser supports parallel scripts}\\

\\

\\

N_S & \quad \text{if browser does not support parallel scripts}\\

\end{array} \right. \\

\\

\\

\\

\\

\\

S_{SG_imax} \quad & = \quad \text{Size of the largest script in script group } SG_i\\

\\

\\

\\

\\

\\

T_R \quad & = \quad \sum_{i=1}^{N_{RG}} \left( \frac{S_{RG_imax}}{B_C} + L_C \right) \\

\\

\\

\\

\\

\\

N_{RG} \quad & = \quad \frac{N_R}{min \left( N_{Hmax} \times N_{RH}, N_{max} \right) }\\

\\

\\

\\

\\

\\

S_{RG_imax} \quad & = \quad \text{Size of the largest resource in resource group } RG_i

\end{align}

So this is how it works...

We assume that the main page's download time is a linear function of its size, bandwidth, the time it takes for the server to build the page and the network latency between the user and the server. While this is not correct (consider multiple flushes, bursty networks, and other factors), it is close.

We then consider all scripts in groups based on whether the browser can handle parallel script downloads or not. Script groups are populated based on the following algorithm:

Similarly, we consider the case for all non-script resources:

For DNS, we assume that all DNS lookups are done sequentially. This makes our equation fairly simple, but turns our result into an overestimate.

Overall, this gives us a fairly good guess at what the roundtrip time for the page would be, but it only works well for high bandwidth values.

We go wrong with our assumptions at a few places. For example, we don't consider the fact that resources may download in parallel with the page itself, or that when the smallest script/resource in a group has been downloaded, the browser can start downloading the next script/resource. We ignore the fact that some browsers can download scripts and resources in parallel, and we assume that the browser takes no time to actually execute scripts and render the page. These assumptions introduce an error into our calculations, however, we can overcome them in the lab. Since the primary purpose of this experiment is to determine the roundtrip time of a page without actually pushing it out to users, this isn't a bad thing.

So, where do we get our numbers from?

All browser characteristics come from BrowserScope.

The user's bandwidth is variable, so we leave that as a variable to be filled in by the developer running the test. We could simply select 5 or 6 bandwidth values that best represent our users based on the numbers we get from boomerang. Again, since this equation breaks down at low bandwidth values, we could simply ignore those.

The latency to our CDN is something we can either pull out of data that we've already gathered from boomerang, or something we can calculate with a simple and not terribly incorrect formula:

\[

L_C = 4 \times \frac{distance\left(U \leftrightarrow C\right)}{c_{fiber}}

\]

Where \(c_{fiber}\) is the speed of light in fiber, which is approximately \(2 \times 10^8 m/s\).

DNS latency is a tough number, but since most people are fairly close to their ISPs, we can assume that this number is between 40-80ms. The worst case is much higher than that, but on average, this should be correct.

The last number we need is \(L_S\), the time it takes for the server to generate the page. This is something that we can determine just by hitting our server from a nearby box, which is pretty much what we do during development. This brings us to the tool we use to do all the calculations.

YSlow already analyses a page's structure and looks at the time it takes to download each resource. We just pull the time out from what YSlow already has. YSlow also knows the size of all resources (both compressed and uncompressed), how many domains are in use and more. By sticking these calculations into YSlow, we could get a number that a developer can use during page development.

The number may not be spot on with what real users experience, but a developer should be able to compare two page designs and determine which of these will perform better even if they get the same YSlow score.

Naturally this isn't the end of the story. We've been going back and forth on this some more, and are tending towards more of a CPM approach to the problem. I'll write more about that when we've sorted it out.

For now, leave a comment letting me know what you think. Am I way off? Do I have the right idea? Can this be improved upon? Is this something you'd like to see in YSlow?

possibly.

Let's break the problem down first. There are two large unknowns in there:

- a particular user
- the page's structure

- Network characteristics:

- Bandwidth to the origin server ( \(B_O\) )
- Bandwidth to your CDN ( \(B_C\) )
- Latency to the origin server ( \(L_O\) )
- Latency to your CDN ( \(L_C\) )
- DNS latency to their local DNS server ( \(L_D\) )

- Browser characteristics:

- Number of parallel connections to a host ( \(N_{Hmax}\) )
- Number of parallel connections overall ( \(N_{max}\) )
- Number of DNS lookups it can do in parallel ( \(N_{Dmax}\) )
- Ability to download scripts in parallel
- Ability to download css in parallel (with each other and with scripts)
- Ability to download images in parallel with scripts

- Page characteristics:

- Document size (\(S_O\) )
- Size of each script (\(S_{S_i}\))
- Size of each non-script resource (images, css, etc.) (\(S_{R_i}\))
- Number of scripts ( \(N_S\))
- Number of non-script resources (\(N_R\))
- Number of hostnames (\(N_H\)), further broken down into:

- Number of script hostnames (\(N_{SH}\))
- Number of non-script hostnames (\(N_{RH}\))

To simplify the equation a bit, we assume that bandwidth and network latency from the user to the CDN and the origin are the same. Additionally, the latency for the main page includes both network latency and the time it takes the server to generate the page (\(L_S\)). Often this time can be significant, so we redefine the terms slightly:

\begin{align}

B_O & = B_C \\

L_O & = L_S + L_C

\end{align}

Browser characteristics are easy enough to obtain. Simply pull the data from BrowserScope's Network tab. It contains almost all the information we need. The only parameter not listed is the number of parallel DNS lookups that a browser can make. Since it's better to err on the side of caution, we assume that this number is 1, so for all further equations, assume \(N_{Dmax} = 1\).

Before I get to the equation, I should mention a few caveats. It's fairly naïve, assuming that all resources that can be downloaded in parallel will be downloaded in parallel, that there's no blank time between downloads, and that the measured bandwidth \(B_C\) is less than the actual channel capacity, therefore multiple parallel TCP connections will all have access to the full bandwidth. This is not entirely untrue for high bandwidth users, but it does breakdown when we get down to dial-up speeds. Here's the equation:

\[

T_{RT} = T_P + T_D + T_S + T_R

\]

Where:

\begin{align}

T_P \quad & = \quad L_O + \frac{S_O}{B_C}\\

\\

\\

\\

\\

\\

T_D \quad & = \quad \frac{N_H}{N_{Dmax}} \times L_D\\

\\

\\

\\

\\

\\

T_S \quad & = \quad \sum_{i=1}^{N_{SG}} \left( \frac{S_{SG_imax}}{B_C} + L_C \right) \\

\\

\\

\\

\\

\\

N_{SG} \quad & = \quad \left\{

\begin{array}{1 1}

\frac{N_S}{min \left( N_{Hmax} \times N_{SH}, N_{max} \right) } & \quad \text{if browser supports parallel scripts}\\

\\

\\

N_S & \quad \text{if browser does not support parallel scripts}\\

\end{array} \right. \\

\\

\\

\\

\\

\\

S_{SG_imax} \quad & = \quad \text{Size of the largest script in script group } SG_i\\

\\

\\

\\

\\

\\

T_R \quad & = \quad \sum_{i=1}^{N_{RG}} \left( \frac{S_{RG_imax}}{B_C} + L_C \right) \\

\\

\\

\\

\\

\\

N_{RG} \quad & = \quad \frac{N_R}{min \left( N_{Hmax} \times N_{RH}, N_{max} \right) }\\

\\

\\

\\

\\

\\

S_{RG_imax} \quad & = \quad \text{Size of the largest resource in resource group } RG_i

\end{align}

So this is how it works...

We assume that the main page's download time is a linear function of its size, bandwidth, the time it takes for the server to build the page and the network latency between the user and the server. While this is not correct (consider multiple flushes, bursty networks, and other factors), it is close.

We then consider all scripts in groups based on whether the browser can handle parallel script downloads or not. Script groups are populated based on the following algorithm:

for each script: if size of group > NIf a browser cannot handle parallel scripts, then we just temporarily set \(N_{max}\) to 1._{max}: process and empty group else if number of scripts in group for a given host > N_{Hmax}: ignore script for the current group, reconsider for next group else add script to group process and empty group

Similarly, we consider the case for all non-script resources:

for each resource: if size of group > N_{max}: process and empty group else if number of resources in group for a given host > N_{Hmax}: ignore resource for the current group, reconsider for next group else add resource to group process and empty group

For DNS, we assume that all DNS lookups are done sequentially. This makes our equation fairly simple, but turns our result into an overestimate.

Overall, this gives us a fairly good guess at what the roundtrip time for the page would be, but it only works well for high bandwidth values.

We go wrong with our assumptions at a few places. For example, we don't consider the fact that resources may download in parallel with the page itself, or that when the smallest script/resource in a group has been downloaded, the browser can start downloading the next script/resource. We ignore the fact that some browsers can download scripts and resources in parallel, and we assume that the browser takes no time to actually execute scripts and render the page. These assumptions introduce an error into our calculations, however, we can overcome them in the lab. Since the primary purpose of this experiment is to determine the roundtrip time of a page without actually pushing it out to users, this isn't a bad thing.

So, where do we get our numbers from?

All browser characteristics come from BrowserScope.

The user's bandwidth is variable, so we leave that as a variable to be filled in by the developer running the test. We could simply select 5 or 6 bandwidth values that best represent our users based on the numbers we get from boomerang. Again, since this equation breaks down at low bandwidth values, we could simply ignore those.

The latency to our CDN is something we can either pull out of data that we've already gathered from boomerang, or something we can calculate with a simple and not terribly incorrect formula:

\[

L_C = 4 \times \frac{distance\left(U \leftrightarrow C\right)}{c_{fiber}}

\]

Where \(c_{fiber}\) is the speed of light in fiber, which is approximately \(2 \times 10^8 m/s\).

DNS latency is a tough number, but since most people are fairly close to their ISPs, we can assume that this number is between 40-80ms. The worst case is much higher than that, but on average, this should be correct.

The last number we need is \(L_S\), the time it takes for the server to generate the page. This is something that we can determine just by hitting our server from a nearby box, which is pretty much what we do during development. This brings us to the tool we use to do all the calculations.

YSlow already analyses a page's structure and looks at the time it takes to download each resource. We just pull the time out from what YSlow already has. YSlow also knows the size of all resources (both compressed and uncompressed), how many domains are in use and more. By sticking these calculations into YSlow, we could get a number that a developer can use during page development.

The number may not be spot on with what real users experience, but a developer should be able to compare two page designs and determine which of these will perform better even if they get the same YSlow score.

Naturally this isn't the end of the story. We've been going back and forth on this some more, and are tending towards more of a CPM approach to the problem. I'll write more about that when we've sorted it out.

For now, leave a comment letting me know what you think. Am I way off? Do I have the right idea? Can this be improved upon? Is this something you'd like to see in YSlow?

## 6 comments :

So in these studies, does one assume a completely uncached scenario?

A variability that I have seen is low ttl DNS servers like Akamai DNS. With TTLs around 60 seconds, and multiple such hostnames in a page, it seems that even with caching on, there is always a good probability of users hitting DNS everytime. Of course, I am probably incorrectly assuming that browser DNS caches use TTL data.

So should roundtrip measuring tools assume a simplistic model with cache either totally off or totally on, or should they use a probability in the equation?

Secondly, since a single host lookup might require multiple DNS queries by the caching DNS servers (e.g. www.yahoo.com requires resolving two CNAMEs), shouldn't that also be accounted for in the DNS numbers?

I am stressing on DNS calls here, because from personal experience, DNS latencies seem to be relatively more significant in our region.

YSlow also knows which components will be cached and which ones won't, so we could estimate both numbers. We'd have to do some kind of user analysis to determine what percentage of our users have a primed cache to decide what the final number will be.

Regarding DNS, it's really a black hole. There's very little we can do to actually measure what the user's DNS characteristics are, so we just guess. Our worst case scenario is 500ms per DNS request, so 4 lookups require 2 seconds, but there are ways around that.

Very interesting.

A couple of months ago, I've tried to come up with a similar formula and found a paper named "Prediction of short-lived TCP transfer latency on bandwidth asymmetric links". Maybe you can integrate their formula with yours to get even more precise predictions.

However, I must wonder whether it wouldn't be a lot easier (and more precise) just to emulate the network scenario and see what happens. You'll need to emulate two links (one for your server, one for your CDN), but it doesn't seem like a lot of problem.

Great post!

Re: caching, I like the idea of tracking two numbers: First, the worst case, first-time visitor with an unprimed cache. And second, the best case, returning visitor with every object cached that possibly can be.

Variables like visitor recency and browser cache size will place "reality" somewhere in between these two bounds, but at least now I have a band I can manage to. And if I want to learn more about what the distribution of real-world RT times looks like in that band I can use tools like boomerang.js for that :)

Should it account for a connection establishing round-trip and slow-start? A page which requires connections to multiple servers not only incurs DNS round trips, but also the added overhead of establishing the connection (another round trip), and figuring out an optimal packet size?

The thing is that we can only guess about which hosts will actually require a DNS lookup and will incur the cost of slow-start and connection set up time. All of those fit into the guesstimate that makes up \(T_D\).

To be more accurate about this, we'd have to replay the average user's session to determine where they came from and what they already had cached (DNS, connections and objects).

## Post a Comment