[philiptellis] /bb|[^b]{2}/
Never stop Grokking


Saturday, January 29, 2011

Printing unicode characters in web documents

I often have to look up reference sites to find out how to write a particular character in HTML, JavaScript or CSS when that character isn't on my keyboard. This post should save me some searching time in future.

To type a character that's not on your keyboard, you need its unicode codepoint in decimal or hexadecimal. In the examples below, HH means two hexadecimal digits, DD means two decimal digits, HHHH is four hexadecimal digits, and so on. DD+ means two or more decimal digits, HH+ means two or more hexadecimal digits.

HTML

To type out unicode characters in HTML, use one of the following:
  • &#DDD+;
  • &#xHHH+;
eg:
Ɖ == Ɖ
Ɖ == Ɖ
‡ == ‡

JavaScript

To type out unicode characters in JavaScript, use the following:
  • \uHHHH
eg:
 == \u2021

CSS

To print out a unicode character using CSS content, use the following:
  • \HH+
eg:
 == \2021
(Note: the CSS example that I've used here only works in browsers that support the :before pseudo class and the content rule, but in general you can use unicode characters anywhere in CSS.)

URL

URL context is different from HTML context, so I'm including it here.

To print a unicode character into a URL, you need to represent it in UTF-8, using the %HH notation for each byte.
eg:
‡ ==  %E2%80%A1
л ==  %D0%BB
' ==  %39
This is not something that you want to do by hand, so use a library to do the conversion. In JavaScript, you can use the encodeURI or encodeURIComponent functions to do this for you.

End notes

Use escape sequences only in two cases.
  1. Your editor or keyboard doesn't allow you to type the characters in directly.
  2. The characters could be misinterpreted as syntax, eg < or > in HTML.

References and Further Reading

  1. List of Unicode Characters on WikiPedia
  2. UTF-8 on WikiPedia
  3. Unicode and HTML on WikiPedia
  4. JavaScript Unicode Escape Sequences on Mozilla Developer Network
  5. Richard Ishida. 2005. Using Character Escapes in Markup and CSS in W3C Internationalisation.

5 comments :

tex texin
January 30, 2011 3:09 PM

You also use character escapes when the characters are ambiguous just looking at them. For example, no-break space vs space, or the several dashes/hyphens, etc. Using the escape instead of the actual character makes it clear in the markup or code, which is being used.

tex texin
January 30, 2011 3:23 PM

I am not sure why you say that the CSS example only works for the one case. Most browsers support escapes in CSS.
You might clarify that the CSS escape requires a space or other non-HEX character after it if you use the form shorter than 6 hex digits.

Philip
January 30, 2011 4:25 PM

I was referring to the example I used in this page. I specifically used :before and content, so older IE browsers will show nothing in my example.

Anonymous
January 31, 2011 11:00 AM

Or, since you probably need to look up the character anyway, you could do so using a tool that allows you to copy and paste it into your source, such as http://rishida.net/scripts/uniview/ or one of my pickers at http://rishida.net/scripts/pickers (see for example the Latin picker).

Btw, there's also http://rishida.net/tools/conversion for converting between all these various escape formats.

Hope that helps.

Philip
January 31, 2011 2:35 PM

Actually most operating systems have a built in character picker... at least Linux and Mac OS X do. I don't know about windows.

Post a Comment

...===...