[philiptellis] /bb|[^b]{2}/
Never stop Grokking


Showing posts with label web. Show all posts
Showing posts with label web. Show all posts

Thursday, January 08, 2015

IE throws an "Invalid Calling Object" Exception for certain iframes

On a site that uses boomerang, I found a particular JavaScript error happen very often:
TypeError: Invalid calling object
This only happens on Internet Explorer, primarily IE 11, but I've seen it on versions as old as 9. I searched through stack overflow for the cause of this error, and while many of the cases sounded like they could be my problem, further investigation showed that my case didn't match any of them. The code in particular that threw the exception was collecting resource timing information for all resources on the page. Part of the algorithm involves drilling into iframes on the page, and this error showed up on one particular iframe. There are a few things to note:
   ("performance" in frame) === true;

   frame.hasOwnProperty("performance") === false;
The latter is not a surprise since hasOwnProperty("performance") is not supported for window objects on IE (I've seen this before when investigating JSLint problems.) There was no problem accessing frame.document, but accessing frame.performance threw an exception.
    frame.performance;    // <-- throws "TypeError: Invalid calling object" with error code -2147418113

    frame["performance"]; // <-- throws "TypeError: Invalid calling object" with error code -2147418113
In fact, frame.<anything except document> would throw the same exception. So I looked at the iframe's document object some more, and found this:
    frame.document.pathname === "/xxx/yyy/123/4323.pdf";
The frame was pointing to a PDF document, and while IE was creating a reference to hold the performance object of this document, it prevented any attempts to access this reference. I tested Chrome and Firefox, and they both create and populate a frame.performance object for PDF documents.

Friday, August 22, 2014

jslint's suggestion will break your site: Unexpected 'in'...

I use jslint to validate my JavaScript before it goes out to production. The tool is somewhat useful, but you really have to spend some time ignoring all the false errors it flags. In some cases you can take its suggestions, while in others you can ignore them with no ill effects.

In this particular case, I came across an error, where, if you follow the suggestions, your site will break.

My code looks like this:

   if (!("performance" in window) || !window.performance) {
      return null;
   }

jslint complains saying:

Unexpected 'in'. Compare with undefined, or use the hasOwnProperty method instead.

This is very bad advice for the following reasons:

  • Comparing with undefined will throw an exception on Firefox 31 if used inside an anonymous iframe.
  • Using hasOwnProperty will cause a false negative on IE 10 because window.hasOwnProperty("performance") is false even though IE supports the performance timing object.

So, the only course of action, is to use in for this case.

Sunday, December 11, 2011

Using curl with IPv6 addresses

Assuming you have a curl compiled with IPv6, if you wanted to hit a page using its IPv6 address rather than its hostname, you have to do it as follows:
curl "http://\[2600:xxx:yyy::zzz\]/page.html"
The square brackets are required to tell curl that it's an IPv6 address and not a host:port pair. The quotes are required to stop the shell from treating the square brackets as a glob. The backslash is required to stop curl from treating the square brackets as a range specification. The http:// is optional, but good form. This isn't required if you use a hostname or an IPv4 address.

Tuesday, January 25, 2011

device-width and how not to hate your users

I've been catching up on my technical reading, and this weekend was spent on Responsive Enhancement1. I'd read about it before on Jeremy Keith's blog and his comments on proportion perfection over pixel perfection2 made me think. Finally, Kayla's report3 on Smashing Magazine about responsive web design coming up as I was thinking about making bluesmoon.info more mobile friendly is what prompted me to study it in detail.

I'm not going to go into the details of responsive enhancement, the references at the end of this article serve that purpose. This article lists what I think are best practices and my reasons for them.

@media queries

As a web designer or developer, you want your page to be easily viewable across different devices and screen sizes. It shouldn't matter whether your user uses a 21" desktop monitor, a 13" laptop, a 10" iPad or a much smaller smartphone. Responsive web design uses @media queries to change the layout of the page using CSS based on browser width. You might have CSS that looks like this:
/* Default wide-screen styles */

@media all and (max-width: 1024px) {
    /* styles for narrow desktop browsers and iPad landscape */
}

@media all and (max-width: 768px) {
    /* styles for narrower desktop browsers and iPad portrait */
}

@media all and (max-width: 480px) {
    /* styles for iPhone/Android landscape (and really narrow browser windows) */
}

@media all and (max-width: 320px) {
    /* styles for iPhone/Android portrait */
}

@media all and (max-width: 240px) {
    /* styles for smaller devices */
}
And yes, you could go smaller than that, or have intermediate sizes, but I'll cover that later.

viewports

Now this works reasonably well when you resize desktop browsers4, but not so much for mobile browsers. The problem is that mobile browsers (iPhone/Safari, Android/Chrome and Fennec) assume that the page were designed for a wide screen, and shrink it to fit into the smaller screen. This means that even though users could have had a good customised experience for their smaller devices, they won't because the device doesn't know about this5. The trick is to use Apple's viewport6, 7, 8 meta tag in your document's head in conjunction with @media queries9:
<meta name="viewport" content="...">
I've left the content attribute empty for now because this is where I see confusion... which is what we'll talk about now.
width=device-width
Most sites that I've seen advise you to set the content attribute to width=device-width. This tells the browser to assume that the page is as wide as the device. Unfortunately, this is only true when your device is in the portrait orientation. When you rotate to landscape, the device-width remains the same (eg: 320px), which means that even if your page were designed to work well in a 480px landscape design, it would still be rendered as if it were 320px.

It's tempting to use the orientation media query to solve this problem, but orientation doesn't really tell you the actual width of the device. All it tells you is whether the width is larger than or smaller than the device's height. As ppk points out5, since most pages tend to scroll vertically, this is irrelevant.

Use this if you use the same page styles in portrait and landscape orientation. Also note that using width=device-width is the only way to tell android devices to use the device's width12.
initial-scale=1.0,maximum-scale=1.0
Setting initial-scale=1 tells the browser not to zoom in or out regardless of what it thinks the page width is. This is good when you've designed your page to fit different widths since the browser will use the appropriate CSS rules for its own width, and initial-scale stops the zooming problem that we faced without the viewport meta tag.

Unfortunately a bug, or more likely a mis-feature, in mobile safari messes this up when a device is rotated from portrait to landscape mode. initial-scale is honoured only on full page load. On rotate from portrait to landscape mode, the browser assumes that the page width stays the same and scales accordingly (1.5) to make 320 pixels fit into 480pixels. However, as far as @media queries go, it reports a 480px width, and uses the appropriate CSS rules to render the page. This results in a page designed for 480px rendered scaled up 1.5 times. It's not horrible, but it's not desirable. Fennec claims8 that it does the right thing in this case. The Android emulator is impossible to work with and I haven't tested on mobile Opera yet.

To get around this bug, the pixel perfection camp suggests also setting maximum-scale=1. This stops the page zoom in on rotate, but it has the undesired side effect of preventing the user from zooming the page. This is a problem from the accessibility point of view. Zooming in is a very valid use case for users with bad eyesight, and in some cases, even users with good eyesight who just want a closer look at some part of your page. Do this only if you hate your users. It goes without saying that setting user-scalable=no should also not be used on most general purpose pages.

A better solution may be design your page to use the same styles in portrait and landscape orientation and set width=device-width. This way even if it does zoom, it will still be proportionate. See Lanyrd10 for an example of this design.
width=<actual width>
Some sites advise using a specific viewport width and designing your pages for that width. This is fine if you're building a separate page for each device class, but that doesn't flow with the concept of responsive design. Fixed width layouts are for print. The web is fluid and adapts to its users. Your site should too. Don't use this.
@media all and (device-width:480)
While this is a media query rather than an option to the viewport meta tag, I've seen it at various locations, and don't think it's the best option around. Here's why. According to the CSS3 media queries spec11, the device-width media feature describes the width of the rendering surface of the output device. For continuous media, this is the width of the screen. For paged media, this is the width of the page sheet size.

We're dealing with continuous media here (on-screen as opposed to printed), in which case the spec states that this is the width of the screen. Unless the browser window is maximised, this might be larger than the viewport with. My tests show that most desktop browsers treat device-width and width as synonyms. Mobile browsers seem a little confused on the matter. As far as the viewport meta tag goes, device-width is the width of the device in portrait orientation only. For a 320x480 device, device-width is always 320px regardless of orientation. For CSS media queries, however, device-width is the width of the screen based on its current orientation.

If you are going to use this, use it in conjunction with the orientation media feature. Never use max-device-width and min-device-width. It's better to use max-width and min-width instead. Also remember that device widths may change with newer models. You want your design to be future proof.

Intermediate widths

I'd mentioned above that you could design for any number of widths. The important thing is to test your page for different browser widths. This is fairly easy to do just by resizing your browser window. Test, and whenever you find your page layout break, either fix the layout for all widths, or build a new layout for smaller widths.

On bluesmoon.info, I change many parts of the page depending on page width. The default design (at the time of this article) has 5% empty space around the content. This is fine really wide screens (1152px or more), but as you get smaller, the empty space becomes a waste. Below 1152px, I shrink this to 2% and below 1024px, I get rid of it completely. You could say that my page content was really built for 1024px. This design also works for the iPad in landscape mode.

Below 1000px, all 3 column pages switch to a 2 column layout. Below 580px, I move the right column on all pages below the main content. All pages that initially had 3 columns now have 2 columns below the main content.

As we get smaller, I reduce the space used by non-essential content like the footer, the sidebars and the menu at the top, leaving as much space as possible for main content. Finally, when we get below 380px, the whole page turns into a single column layout.

This is of course, just an example. Your own site may have a layout that works perfectly at all screen widths, or you may need to design only two or three layouts. It's easy to test and design, so there's no reason not to. Designing for multiple widths took me just a couple of hours, and a lot of it was spent reading the articles below.

Recommendations

So finally, this is what I recommend.
  1. DO use the viewport meta tag
  2. DO use media queries to render your page appropriately for various widths ranging from under 200px to 1024px or more
  3. DO use width=device-width,initial-scale=1 in your viewport meta tag OR use width=device-width alone12.
  4. DO NOT use maximum-scale=1 or user-scalable=no
  5. DO NOT use width=<specific width>
  6. DO NOT use @media all and (*-device-width: xxx)

Remember that using initial-scale=1.0 throws you open to a zooming bug in mobile Safari. Push Safari to fix this bug. Finally, David Calhoun has a great summary13 of all options to the viewport meta tag, and alternate meta tags for older phones. Well worth a read. Also note that Mozilla's documentation8 of the viewport meta tag is far better than Safari's7.

Footnotes & References

  1. Ethan Marcotte. 2010. Responsive Web Design. In A List Apart #306. ISSN: 1534-0295.
  2. Jeremy Keith. 2010. Responsive Enhancement. In adactio.
  3. Kayla Knight. 2011. Responsive Web Design: What It Is and How To Use It. In Smashing Magazine.
  4. Webkit based desktop browsers re-render the page correctly as you resize the browser, however they have a minimum width of 385px (on MacOSX) and I was unable to shrink the browser below this. Firefox 4 re-renders the page correctly until the width gets too narrow to fit the navigation toolbar. At that point the viewport width stays fixed even if you shrink the browser. The page is re-rendered if you type something (anything) into the URL bar. Opera 10/11 re-render correctly at all sizes.
  5. Peter Paul Koch. 2010. A tale of two viewports — part two. In Quirksmode.
  6. Using the Viewport on Safari. In Safari Web Content Guide.
  7. The viewport meta tag. In Safari HTML Reference.
  8. MDC. 2010. Using the viewport meta tag to control layout on mobile browsers. In Mozilla Developer Network.
  9. Peter Paul Koch. 2010. Combining meta viewport and media queries. In Quirksmode.
  10. Willison & Downe. Lanyrd.
  11. Lie et al. 2010. Media Queries. W3C Candidate Recommendation 27 July 2010.
  12. If you design your page for the narrow view and expect it to scale when rotated, then use width=device-width and nothing else. If, instead, you design your page for either width, then use width=device-width,initial-scale=1. This is the only way to get the android browser to render a page with the intended width. Mobile Safari will render the page exactly as if initial-scale=1 were specified alone. You will still end up with the zoom on rotate bug.
  13. David Calhoun. 2010. The viewport metatag (Mobile web part I).

Monday, October 18, 2010

Common Security Mistakes in Web Applications

I've just written my first article for Smashing Magazine. It's titled Common Security Mistakes in Web Applications. This is also my first security related post after joining the Yahoo! Paranoid group. The article covers XSS, CSRF, Phishing, SQL injection, Click-Jacking and Shell injection.

Wednesday, October 13, 2010

What's a browser? — the developer edition

Nicholas Zakas has a great writeup explaining a web browser to non-technical people. I thought I'd take this opportunity to explain what a web browser is to web developers.

At the heart of it, a web browser is two things. It is a program that can communicate with a web server using the HTTP protocol, and it is a program that can render HTML and other associated content types... except that it might not care to. As web developers looking out at the world through the window of a TCP socket on port 80, all we see is an agent on the other end issuing GET and POST requests. What it does with our responses, we don't really know. We do our best to cater to what we can identify. We look at user-agent strings to build statistics of the kinds of browsers that visit our site, and we use JavaScript to detect capabilities that we can use to enhance the experience that we deliver, but what if that JavaScript were never executed and the user-agent string were a lie?

No, at the heart of it, a web browser is just one thing — an HTTP client.

Built upon this HTTP client could be various entities. A real web rendering engine, a crawling bot, an audio browser, a web service client, or a script kiddie using curl. While it may be impossible to know of all possible entities on the other end, as web developers, we must build applications that are prepared to deal with anything.

We use progressive enhancement to create an engaging experience for genuine users of our site regardless of the capabilities of the user agent they use. We validate everything that comes in over that HTTP connection to prevent the destruction or degradation of our service either by malice or accident, and we trust nothing that comes in from the other end. Not the POST data, not the query string, not the cookies, not the request time, and certainly not the user agent string.

Do we assume our users are the kind that Nicholas describes or do we assume that they're all out to destroy us, or perhaps somewhere in between? The reality is that we have to build our sites for all of them. Start with HTTP and progressively enhance from there.

Monday, July 26, 2010

4.01 Strict — Invention of the Web

Tim Berners-Lee invents the WWW

Tim Berners-Lee invents the Web

There was a young man named Tim,
Who wanted to share docs on a whim,
He built HTTP,
But the W3C
Made standards so webdevs would sin

Monday, July 12, 2010

4.01 Strict — SpeedFreak

Mild mannered Stoyan Stefanov makes an appearance in this week's 4.01 Strict.

Stoyan (the speedfreak) Stefanov saves the web

Saturday, April 10, 2010

Can a website crash your RHEL5 desktop?

A few months ago I started having trouble with my RHEL5 desktop when running Firefox 3.5. On a few websites, the entire system would crash pretty consistently. It took a long time, but I finally managed to find the problem and get a fix in there.

My solution is documented on the YDN blog. Please leave comments there.

Edit 2022-10-31: It looks like the YDN blog no longer has any posts, so I've pulled this off the Internet Archive and reposted it here:

On Ajaxian, Chris Heilmann recently wrote about a piece of JavaScript to crash Internet Explorer 6 (IE6). That's not something I worry about because I'm a geek and I've used a Linux-based operating system as my primary desktop for the last 10 years. I've kept my system up to date with all patches, never log in as root, and have a short timeout on sudo. I've believed that while a malicious website could possibly affect my browser (Firefox), it was unlikely to affect my OS. That was up until a few months ago, when I upgraded to Firefox 3.5.

I started noticing that a few websites would consistently cause my system to freeze and the bottom part of the screen would show pixmaps from all over the place. The system would stay this way for a few seconds, and then I'd be thrown off to the login screen. My error log showed that X.org had been killed by a segfault. At times the system would completely freeze and the only way to get it back was a hard reboot (yes, I tried pinging and sshing in first).

Yikes. This wasn't supposed to happen. Even worse, this meant that anyone who knew how to exploit this could cause my system to crash at will. On further investigation, it appeared that this problem showed up with sites that used jQuery or YUI, but it wasn't consistent. It also happened only with Firefox 3.5 or higher on Red Hat-based systems. Debian-based systems like Ubuntu didn't have any trouble.

I also found that we could consistently reproduce the problem with Yahoo! Search, which is where Ryan Grove and Sarah Forth-Smith came in to debug the problem. Even weirder was that my Gnome desktop would bleed through elements on the Search results page. Eventually we hit upon Bug 498500 on Red Hat's Bugzilla bug-tracking system.

I edited /etc/X11/xorg.conf and added Option "XaaNoOffscreenPixmaps" to the Device Section. I restarted X and started surfing. I surfed for several weeks and used Y! Search all the time. I also used a bunch of the other sites that caused the previous problems. I used sites with jQuery and YUI.

No more screen fuzz, no more freezes, no more crashes, and no more reboots.

I haven't investigated this further, but my best guess for what would have caused this problem is CSS sprites that are partially hidden, or elements with negative left margins. The former is a common performance optimization, while the latter is common for page accessibility. Both good things, so not something you'd want to change.

In any event, if you're like me and have a Linux-based desktop, and see a similar problem, it may be worth trying the preceding solution.

Note: The bug in question has been resolved by Red Hat.

Friday, February 19, 2010

Missing kids on your 404 page

It's been a long time since I last posted, and unfortunately I've been unable to churn out a post every week. The month of February has been filled with travel, so I haven't had much time to write.

My report on FOSDEM is up on the YDN blog, so I haven't been completely dormant. I also did some stuff at our internal hack day last week. This post is about one of my hacks.

The idea is quite simple. People land up on 404 pages all the time. 404 pages are pages that have either gone missing, or were never there to begin with. 404 is the HTTP error code for a missing resource. Most 404 pages are quite bland, simply stating that the requested resource was not found, and that's it. Back when I worked at NCST, I changed the default 404 page to use a local site search based on the requested URL. I used the namazu search engine since I was working on it at the time.

This time I decided to do something different. Instead of searching the local site for a missing resource, why not engage the user in trying to find missing kids.

I started with trying to find an API for missingkids.com and ended up finding missingkidsmap.com. This service takes the data from Missing Kids and puts it on a google map. The cool thing about the service was that it could return data as XML.

Looking through the source code, I found the data URL:
http://www.missingkidsmap.com/read.php?state=CA
The state code is a two letter code for states in the US and Canada. To get all kids, just pass in ZZ as the state code.

The data returned looks like this:
<locations>
   <maplocation zoom="5"
                state_long="-119.838867"
                state_lat="37.370157"/>
   <location id="1"
             firstname="Anastasia"
             lastname=" Shearer "
             picture="img width=160 target=_new src=http://www.missingkids.com/photographs/NCMC1140669c1.jpg"
             picture2="img width=160 target=_new src=http://www.missingkids.com/photographs/NCMC1140669e1.jpg"
             medpic = "img width=60 border=0 target=_new src=http://www.missingkids.com/photographs/NCMC1140669c1.jpg"
             smallpic="img width=30 border=0 target=_new src=http://www.missingkids.com/photographs/NCMC1140669c1.jpg"
             policenum="1-661-861-3110"
             policeadd="Kern County Sheriff\'s Office (California)"
             policenum2=""
             policeadd2=""
             st=" CA"
             city="BAKERSFIELD"
             missing="12/26/2009"
             status="Endangered Runaway"
             age="16"
             url="1140669"
             lat="35.3733333333333"
             lng="-119.017777777778"/>
   ...
</locations>

Now I could keep hitting this URL for every 404, but I didn't want to kill their servers, so I decided to pass the URL through YQL and let them cache the data. Of course, now that I was passing it through YQL, I could also do some data transformation and get it out as JSON instead of XML. I ended up with this YQL statement:
SELECT * From xml
 Where url='http://www.missingkidsmap.com/read.php?state=ZZ'
Pass that through the YQL console to get the URL you should use. The JSON I got back looked like this:
{
   "query":{
      "count":"1",
      "created":"2010-02-19T07:30:44Z",
      "lang":"en-US",
      "updated":"2010-02-19T07:30:44Z",
      "uri":"http://query.yahooapis.com/v1/yql?q=SELECT+*+From+xml%0A+Where+url%3D%27http%3A%2F%2Fwww.missingkidsmap.com%2Fread.php%3Fstate%3DZZ%27",
      "results":{
         "locations":{
            "maplocation":{
               "state_lat":"40.313043",
               "state_long":"-94.130859",
               "zoom":"4"
            },
            "location":[{
                  "age":"7",
                  "city":"OMAHA",
                  "firstname":"Christopher",
                  "id":"Szczepanik",
                  "lastname":"Szczepanik",
                  "lat":"41.2586111111111",
                  "lng":"-95.9375",
                  "medpic":"img width=60 border=0 target=_new src=http://www.missingkids.com/photographs/NCMC1141175c1.jpg",
                  "missing":"12/14/2009",
                  "picture":"img width=160 target=_new src=http://www.missingkids.com/photographs/NCMC1141175c1.jpg",
                  "picture2":"",
                  "policeadd":"Omaha Police Department (Nebraska)",
                  "policeadd2":"",
                  "policenum":"1-402-444-5600",
                  "policenum2":"",
                  "smallpic":"img width=30 border=0 target=_new src=http://www.missingkids.com/photographs/NCMC1141175c1.jpg",
                  "st":" NE",
                  "status":"Missing",
                  "url":"1141175"
               },
               ...
            ]
         }
      }
   }
}

Step 2 was to figure out whether the visitor was from the US and Canada, and if so, figure out which state they were from and pass that state code to the URL.

This is fairly easy to do at Yahoo!. Not so much on the outside, so I'm going to leave it to you to figure it out (and please let me know when you do).

In any case, my code looked like this:
$json = http_get($missing_kids_url);
$o = json_decode($json, 1);
$children = $o['query']['results']['locations']['location'];

$child = array_rand($children);

print_404($child);
http_get is a function I wrote that wraps around curl_multi to fetch and cache locally a URL. print_404 is the function that prints out the HTML for the 404 page using the $child data object. The object's structure is the same as each of the location elements in the JSON above. The important parts of print_404 are:
function print_404($child)
{
   $img = preg_replace('/.*src=(.*)/', '$1', $child["medpic"]);
   $name = $child["firstname"] . " " . $child["lastname"];
   $age = $child['age'];
   $since = strtotime(preg_replace('|(\d\d)/(\d\d)/(\d\d\d\d)|', '$3-$1-$2', $child['missing']));
   if($age == 0) {
      $age = ceil((time()-$since)/60/60/24/30);
      $age .= ' month';
   }
   else
      $age .= ' year';

   $city = $child['city'];
   $state = $child['st'];
   $status = $child['status'];
   $police = $child['policeadd'] . " at " . $child['policenum'];

   header('HTTP/1.0 404 Not Found');
?>
<html>
<head>
...
<p>
<strong>Sorry, the page you're trying to find is missing.</strong>
</p>
<p>
We may not be able to find the page, but perhaps you could help find this missing child:
</p>
<div style="text-align:center;">
<img style="width:320px; padding: 1em;" alt="<?php echo $name ?>" src="<?php echo $img ?>"><br>
<div style="text-align: left;">
<?php echo $age ?> old <?php echo $name ?>, from <?php echo "$city, $state" ?> missing since <?php echo strftime("%B %e, %Y", $since); ?>.<br>
<strong>Status:</strong> <?php echo $status ?>.<br>
<strong>If found, please contact</strong> <?php echo $police ?><br>
</div>
</div>
...
</body>
</html>
<?php
}
Add in your own CSS and page header, and you've got missing kids on your 404 page.

The last thing to do is to tell apache to use this script as your 404 handler. To do that, put the page (I call it 404.php) into your document root, and put this into your apache config (or in a .htaccess file):
ErrorDocument 404 /404.php
Restart apache and you're done.

Update: 2010-02-24 To see it in action, visit a missing page on my website. eg: http://bluesmoon.info/foobar.

Update 2: The code is now on github: http://github.com/bluesmoon/404kids

Update: 2010-02-25 Scott Hanselman has a Javascript implementation on his blog.

Update: 2010-03-28 There's now a drupal module for this.

Tuesday, November 24, 2009

Measuring a user's bandwidth

In my last post about performance, I spoke about measurement. Over the last few days I've been looking at bandwidth measurement. These ideas have been floating around for years and we've tested some before at Yahoo!, but I wanted to try a few new things.

Try it out now.


The concept is actually quite simple.
  1. Try to download multiple images with progressively increasing sizes
  2. Set a reasonable timeout for the images to download
  3. Stop at the first one that times out - that means that we have enough data to make an estimation.
  4. Calculate the bandwidth by dividing each image's size by the time it took to download.
I run this test a few times, and then run some statistical analysis on the data gathered. The analysis is pretty basic. I first pull out the geometric mean of the data, then sort the data, run IQR filtering on it, and then pull out the median point. I use the geometric mean as well as the post IQR filtered median because I'm not sure at this point which is more resilient to temporary changes in network performance. This data is then stored in a database along with the user's IP address and the current timestamp.

I also try to measure latency. This is not network latency, but server latency from the user's point of view, ie, how long does it take between request and first byte of response. I run this test multiple times and do the same kind of stats on this data.

The goal of this test

The few people I've shown this to all had the same question. What's the goal of this test? There are already several free bandwidth testers available that one can use to determine ones bandwidth, so what does this do differently.

The way I see it, as a site owner, I don't really care about the bandwidth that my users have with their ISPs - unless of course, I have my servers in the ISP's data centre. I really care about the bandwidth that user's experience when visiting my website. This test aims to measure that. Ideally, this piece of code can be put into any web page to measure the user's bandwidth in the background while he's interacting with your site. I don't know how it will work in practice though.

Insights from the data

I don't really know. It could be useful to figure out what users from different geographical locations experience. Same with ISPs. It might also just tell me that dreamhost is a really bad hosting provider.

Data consistency

In my repeated tests, I've found that the data isn't really consistent. It's not all over the place, but it fluctuates a fair bit. I've seen different levels of consistency when using the geometric mean and the median, but I don't think I have enough data yet to decide which is more stable. This could mean that my server just responds differently to multiple requests or it could mean many other things. I don't really know, but feel free to leave a comment if you do.

Credits

I don't know who first came up with the idea of downloading multiple images to test bandwdith, but it wasn't my idea. The latency test idea came from Tahir Hashmi and some insights came from Stoyan Stefanov.

Once again, here's the link.

Short URL: http://tr.im/bwmeasure

Sunday, November 01, 2009

Performance measurement

In my last post, I mentioned the factors that affect web performance. Now that we know what we need to measure, we come to the harder problem of figuring out how to measure each of them. There are different methods depending on how much control you have over the system and the environment it runs in. Additionally, measuring performance in a test setup may not show you what real users experience, however it does give you a good baseline to compare subsequent tests against.

Web, application and database servers

Back end servers are the easiest to measure because we generally have full control over the system and the environment it runs in. The set up is also largely the same in a test and production environment, and by replaying HTTP logs, it's possible to simulate real user interactions with the server.

Some of the tools one can use to measure server performance are:
  • ab - Apache Benchmark. Despite its name, it can be used to test any kind of HTTP server and not just apache. Nixcraft has a good tutorial on using ab.
  • httperf from HP labs is also a good tool to generate HTTP load on a server. There's an article on Techrepublic about using it. I prefer httperf because it can be configured to simulate real user load
  • Log replaying is a good way to simulate real-user load, and a few people have developed scripts to replay an apache log file. The first one uses httperf under the hood.
  • To measure database performance, we could either put profiling code into our application itself, and measure how long it takes for our queries to return under real load conditions, or run benchmarks with the actual queries that we use. For mysql, the mysql benchmarking suite is useful.
  • MySQL Tuner is another tool that can tell you how your live production server has been performing though it doesn't give you numbers to quantify perceived performance. I find it useful to tell me if my server needs retuning or not.
The above methods can also be used to measure the performance of remote web service calls, though you may want to talk to your remote web service provider before doing that.

I won't write any more about these because there are a lot of articles about server side performance measurement on the web.

DNS, CDNs and ISP networks

Measuring the performance of DNS, CDNs and your user's ISP network is much harder because you have control over neither the systems nor the environment. Now I mentioned earlier that DNS is something you can control. I was referring to your own DNS set up, ie, the hostnames you have and how they're set up. This is not something we need to measure since no user will use your DNS server. All users use their ISP's DNS server or something like OpenDNS and it's the performance of these servers that we care about.

DNS

DNS is the hardest of the lot since the only way to measure it is to actually put a client application on your users' machines and have that do the measurement. Unless you have really friendly users, this isn't possible. It is an important measurement though. A paper on DNS Performance [Jung et al., 2002] shows that around 20% of all DNS requests fail. This in turn adds to the overall perceived latency of a website. In the absence of an easy way to measure this performance from within a web page, we'll try and figure it out as a side-effect of other measurements.

One possible method is to request the same resource from a host, the first time using the hostname and the second time using its IP address. The difference should give you the DNS lookup time. The problem with this is that it sort of breaks DNS rotations where you may have multiple physical hosts behind a single hostname. It's even worse with a CDN because the hostname may map onto a server that's geographically closer to the user than the IP address you use. In short, you'd better know what you're doing if you try this.

ISP bandwidth

With ISP networks, the number we really care about is the user's effective bandwidth, and it isn't hard to measure this. We use the following procedure:
  1. Place resources of known fixed sizes on a CDN
  2. Make sure these resources are served with no-cache headers
  3. Using javascript, download these resources from the client machine and measure the time it takes
  4. Discard the first resource since it also pays the price of a DNS lookup and TCP slowstart
  5. Use resources of different sizes to handle very slow and very fast connections.
The number we get will be affected by other things the user is using the network for. For example, if they're streaming video at the same time, then bandwidth measured will be lower than it should be, but we take what we can get.

CDNs

Now to measure bandwidth, we need to get that resource relatively close to the user so that the bandwidth of the whole internet doesn't affect it. That's where CDNs come in, and measuring a CDN's performance is somewhat similar.

We could always use a tool like Gomez or Keynote to do this measurement for you, or you can hack up a solution yourself in Javascript. You need to figure out three things:
  1. The IP of the CDN closest to the user
  2. The user's geo-location which you can figure out from their IP address
  3. The time it takes to download a resource of known size from this CDN
It's that first one that's a toughie, but the simplest way to figure it out is to just ask your CDN provider. Most CDNs also provide you with their own performance reports.

Page content and user interaction

YSlow, Show Slow, Page Speed and Wep Page Test are good tools for measuring and analysing the performance of your page content. They can measure and analyse your page from your development environment and suggest improvements. They do not, however, measure real user perceived performance, however this is something we can do with Javascript.

We primarily need to measure the time it takes to download a page and all its components. Additionally we may want to time how long certain user interactions with the page took. All of these can be accomplished by reading the Date() object in javascript at the correct start and end times. What those start and end times are depend on your application, but we'll look at one possible implementation in a later post. Once you have the timing information that you need, it can be sent back to your server using a javascript beacon. We'll go into more detail about this as well in a later post.

This post has already run longer than I'd hoped for, so I'm going to stop here and will continue next time.

About web performance

I currently work with the performance team at Yahoo!. This is the team that did the research behind our performance best practices and built YSlow. Most of our past members write and speak about performance, and while I've done a few talks, I've never actually written a public post about web performance. I'm going to try and change that today.

Note, however, that this blog is about many technical topics that interest me and web performance is just a part of that.

I'm never sure how to start a new series, especially one that's been spoken about by others, but since these blog posts also serve as a script for the talks that I do, I thought I'd start with the last performance talk that I did.

Improving a website's performance starts with measuring its current performance. We need a baseline measurement that will help us determine if the changes we make cause an improvement or a regression in performance. Before we start with measurement, however, we need to know what to measure, and for that we need to look at all the factors that contribute to the time it takes for a website to get to the user.
User perceived web app time is spent in looking up stuff, building stuff, downloading stuff, rendering stuff and interacting with stuff.
It's this perceived time that we need to reduce, and consequently measure. All of the above fall into two basic categories:
  1. Infrastructure
  2. Content structure
Each of these in turn is made up of components that we as developers can control, and those that we cannot. We'd like to be able to measure everything and fix whatever we have control over. I've split the components that I see into this table so we know what can be looked at and who should do the looking.

Infrastructure Content
Can control
  • Web server & App server
  • Database server
  • Web service calls
  • CDNs
  • DNS
  • HTTP headers
  • HTML
  • Images, Flash
  • CSS
  • Javascript
  • Fonts
Cannot control
  • ISP's DNS servers
  • ISP's network
  • User's bandwidth
  • User's browser & plugins
  • Other apps using the user's network
  • The internet
  • Advertisements
  • Third party content included as badges/feeds
  • Third party sites that link to your page

If you have more items to add to this table, leave a comment and I'll add it in.

This is where we can jump to Yahoo!'s performance rules. At the time of this post, there are 34 of them divided into 7 categories. I'll go into more details and refer to these rules in later posts. That's all for this introductory post though.

Sunday, January 29, 2006

Progressive Enhancement via μMVC - I

The web today is like a huge buzzword bingo game. There's so much flying around that it's hard to stay in touch unless you're in it day in and day out. That's not something that old school engineers like me find easy. I'm far more comfortable staring at my editor, hacking code to interact with a database or some hardware. Interacting with users is tough. Doing it with sound engineering principles is even tougher.

I'm going to take a deep breath now and mention all the web buzzwords that I can think of and somehow fit them into this article.

AJAX, RIA, JSON, XML, XSLT, Progressive Enhancement, Unobtrusiveness, Graceful Degradation, LSM, Accessibility.

Definitions

Let's get a few definitions in there:
AJAX
A generic term for the practice of asynchronously exchanging data between the browser and server without affecting browsing history. AJAX often results in inline editing of page components on the client side.
RIA
Rich Internet Applications - Web apps built to feel like desktop applications. Most often built using AJAX methods and other funky user interactions
JSON
A popular new data interchange language used to exchange data between languages. Extremely useful for compactly sending data from a server side script to a Javascript function on the client
XML
A common, but verbose and slow to parse data interchange/encapsulation format, used to exchange data between client and server.
XSLT
XSL Transformations - transform XML to something else (most likely HTML) using rules. Can be executed on either client or server depending on capabilities
Progressive Enhancement
The practice of first building core functionality and then progressively adding enhancements to improve usability, performance and functionality
Unobtrusiveness
The practice of adding a progressive enhancement without touching existing code
Graceful Degradation
The ability of an application to gracefully retain usability when used on devices that do not support all required features, if necessary by degrading look and feel. Graceful Degradation follows from Progressive Enhancement
LSM
Layered Semantic Markup - The practice of building an application in layers. At the lowest layer is data encapsulated in semantic markup, ie, data marked up with meaning. Higher layers add style and usability enhancements. LSM enables Progressive Enhancement and Graceful Degradation.
Accessibility
The ability of an application to be accessed by all users and devices regardless of abilities or capabilities.
See Also: Progressive Enhancement at Wikipedia, Progressive Enhancement from the guy who coined the term, Progressive Enhancement from Jeremy Keith, Ajax, Graceful Degradation, Layered Semantic Markup, JSON

We'll get down to what this article is about, but first let me add my take on LSM.

LSM's layers

While LSM suggests development in layers, it doesn't specify what those layers should be. Traditionally, developers have looked at three layers: Semantic Markup, Semantic CSS and Javascript. I'd like to take this one level further.

The way I see it, we have 4 (or 5) layers.

Layers 1 and 2 are semantic markup (HTML) and semantic classes (CSS). Layer 3 in my opinion should be restricted to unobtrusive javascript added for UI enhancements. This would include drag and drop, hidden controls, and client side form validation, but no server communication.

Layer 4 adds the AJAX capability, however, just like Layer 3 does not absolve the back end from validating data, layer 4 does not absolve the back end from producing structured data.

Right down at the bottom is syncrhonous, stateless HTTP (Layer 0)

And now, back to our show.

Web application frameworks and MVC

There's been a lot of work in recent times to build web application development frameworks that make it easy for a developer to add AJAX methods to his app. Tools like Ruby on Rails, Django, Dojo and others do this for the user, and build on time tested design patterns.

For a long while web application frameworks have implemented the MVC pattern. Current frameworks merely extend it to move some parts of the view and controller to the client side instead of doing it all server side.

See also: MVCs in PHP, Intro to MVCs in PHP5, The controller, The view.

The problem with this is that your code is now fragmented between client and server, and implemented in different languages, possibly maintained by different programmers. Questions arise as to whether the bulk of your code should go into the server or the client, and of course, which model degrades best to account for accessibility?

Brad Neuberg has an excellent article on the pros and cons of each approach, and when you should choose which.

He still leaves my second question unanswered, but Jeremy Keith answers it with Hijax, his buzzword for hijacking a traditionally designed page with AJAX methods... in other words, progressive enhancement.

I've had thoughts that ran parallel to Jeremy's and it was quite odd that we ended up speaking about almost the same ideas at the same place and time. Well, he published and I didn't, so my loss.

Jeremy's ideas are spot on, but he doesn't mention implementation specifics, or whether the same code base can be used for more than just adding Ajax to an existing application.

More about MVC

The MVC pattern is great in that it doesn't state what your view should be, but merely that it should not be tightly coupled with your application model. Most implementers look at it as a way of designing an entire application around a single controller. Every action and subaction correspond to a controller branch, which in turn decides how data should be manipulated, and which view to call.

While this is good (if implemented correctly) at the high level, it is complex, and prone to bad design. It's not surprising that the big boys get wary when MVCs for web apps and PHP in particular are mentioned.

μMVC

If instead, we look at views as just views of data, and different views of the same data, then we end up with a different structure. Instead of selecting a view based on the action to be performed, we select a view based on the output format that the user wants. This may be HTML in the default case, and the top level controller would merely stitch various HTML subviews together to form the entire page. Each subview sent across to the browser as soon as it's ready to improve performance.

If the user has other capabilities though, we send data in a different format, and chances are, we don't need to send across all subviews. A single subview that's very specific to the data requested is sufficient. We do less work on the server, fewer database queries, send less data across the network and improve performance overall on client and server. The data format selected depends on the client application, and may be an html snippet that goes in to innerHTML, a JSON datastructure that gets parsed on the client side, javascript code that gets evaled on the client side, or XML data returned as a web service or for client side XSL transforms.

We use exactly the same data processing code for all requests, and only switch on the final function that transforms your internal data structures to the required output format.

I call this a micro MVC (μMVC) because the model, view and controller all act on a very fine granularity without considering overall application behaviour. Note also that the view and controller are now split across the server and client.

The client side controller kicks in first telling the server side controller what it's interested in. The server side controller performs data manipulation, and invokes the server side view. The client side controller passes the server side view to the client side view for final display.

This development model fits in well with the LSM framework which in turn leads to Progressive Enhancement and Graceful Degradation, and most of all, it opens up new avenues of accessibility without excessive degradation.

In part II of this article, I'll go into implementation details with examples in PHP and some amount of pseudocode.

...===...