[philiptellis] /bb|[^b]{2}/
Never stop Grokking


Friday, February 19, 2010

Missing kids on your 404 page

It's been a long time since I last posted, and unfortunately I've been unable to churn out a post every week. The month of February has been filled with travel, so I haven't had much time to write.

My report on FOSDEM is up on the YDN blog, so I haven't been completely dormant. I also did some stuff at our internal hack day last week. This post is about one of my hacks.

The idea is quite simple. People land up on 404 pages all the time. 404 pages are pages that have either gone missing, or were never there to begin with. 404 is the HTTP error code for a missing resource. Most 404 pages are quite bland, simply stating that the requested resource was not found, and that's it. Back when I worked at NCST, I changed the default 404 page to use a local site search based on the requested URL. I used the namazu search engine since I was working on it at the time.

This time I decided to do something different. Instead of searching the local site for a missing resource, why not engage the user in trying to find missing kids.

I started with trying to find an API for missingkids.com and ended up finding missingkidsmap.com. This service takes the data from Missing Kids and puts it on a google map. The cool thing about the service was that it could return data as XML.

Looking through the source code, I found the data URL:
http://www.missingkidsmap.com/read.php?state=CA
The state code is a two letter code for states in the US and Canada. To get all kids, just pass in ZZ as the state code.

The data returned looks like this:
<locations>
   <maplocation zoom="5"
                state_long="-119.838867"
                state_lat="37.370157"/>
   <location id="1"
             firstname="Anastasia"
             lastname=" Shearer "
             picture="img width=160 target=_new src=http://www.missingkids.com/photographs/NCMC1140669c1.jpg"
             picture2="img width=160 target=_new src=http://www.missingkids.com/photographs/NCMC1140669e1.jpg"
             medpic = "img width=60 border=0 target=_new src=http://www.missingkids.com/photographs/NCMC1140669c1.jpg"
             smallpic="img width=30 border=0 target=_new src=http://www.missingkids.com/photographs/NCMC1140669c1.jpg"
             policenum="1-661-861-3110"
             policeadd="Kern County Sheriff\'s Office (California)"
             policenum2=""
             policeadd2=""
             st=" CA"
             city="BAKERSFIELD"
             missing="12/26/2009"
             status="Endangered Runaway"
             age="16"
             url="1140669"
             lat="35.3733333333333"
             lng="-119.017777777778"/>
   ...
</locations>

Now I could keep hitting this URL for every 404, but I didn't want to kill their servers, so I decided to pass the URL through YQL and let them cache the data. Of course, now that I was passing it through YQL, I could also do some data transformation and get it out as JSON instead of XML. I ended up with this YQL statement:
SELECT * From xml
 Where url='http://www.missingkidsmap.com/read.php?state=ZZ'
Pass that through the YQL console to get the URL you should use. The JSON I got back looked like this:
{
   "query":{
      "count":"1",
      "created":"2010-02-19T07:30:44Z",
      "lang":"en-US",
      "updated":"2010-02-19T07:30:44Z",
      "uri":"http://query.yahooapis.com/v1/yql?q=SELECT+*+From+xml%0A+Where+url%3D%27http%3A%2F%2Fwww.missingkidsmap.com%2Fread.php%3Fstate%3DZZ%27",
      "results":{
         "locations":{
            "maplocation":{
               "state_lat":"40.313043",
               "state_long":"-94.130859",
               "zoom":"4"
            },
            "location":[{
                  "age":"7",
                  "city":"OMAHA",
                  "firstname":"Christopher",
                  "id":"Szczepanik",
                  "lastname":"Szczepanik",
                  "lat":"41.2586111111111",
                  "lng":"-95.9375",
                  "medpic":"img width=60 border=0 target=_new src=http://www.missingkids.com/photographs/NCMC1141175c1.jpg",
                  "missing":"12/14/2009",
                  "picture":"img width=160 target=_new src=http://www.missingkids.com/photographs/NCMC1141175c1.jpg",
                  "picture2":"",
                  "policeadd":"Omaha Police Department (Nebraska)",
                  "policeadd2":"",
                  "policenum":"1-402-444-5600",
                  "policenum2":"",
                  "smallpic":"img width=30 border=0 target=_new src=http://www.missingkids.com/photographs/NCMC1141175c1.jpg",
                  "st":" NE",
                  "status":"Missing",
                  "url":"1141175"
               },
               ...
            ]
         }
      }
   }
}

Step 2 was to figure out whether the visitor was from the US and Canada, and if so, figure out which state they were from and pass that state code to the URL.

This is fairly easy to do at Yahoo!. Not so much on the outside, so I'm going to leave it to you to figure it out (and please let me know when you do).

In any case, my code looked like this:
$json = http_get($missing_kids_url);
$o = json_decode($json, 1);
$children = $o['query']['results']['locations']['location'];

$child = array_rand($children);

print_404($child);
http_get is a function I wrote that wraps around curl_multi to fetch and cache locally a URL. print_404 is the function that prints out the HTML for the 404 page using the $child data object. The object's structure is the same as each of the location elements in the JSON above. The important parts of print_404 are:
function print_404($child)
{
   $img = preg_replace('/.*src=(.*)/', '$1', $child["medpic"]);
   $name = $child["firstname"] . " " . $child["lastname"];
   $age = $child['age'];
   $since = strtotime(preg_replace('|(\d\d)/(\d\d)/(\d\d\d\d)|', '$3-$1-$2', $child['missing']));
   if($age == 0) {
      $age = ceil((time()-$since)/60/60/24/30);
      $age .= ' month';
   }
   else
      $age .= ' year';

   $city = $child['city'];
   $state = $child['st'];
   $status = $child['status'];
   $police = $child['policeadd'] . " at " . $child['policenum'];

   header('HTTP/1.0 404 Not Found');
?>
<html>
<head>
...
<p>
<strong>Sorry, the page you're trying to find is missing.</strong>
</p>
<p>
We may not be able to find the page, but perhaps you could help find this missing child:
</p>
<div style="text-align:center;">
<img style="width:320px; padding: 1em;" alt="<?php echo $name ?>" src="<?php echo $img ?>"><br>
<div style="text-align: left;">
<?php echo $age ?> old <?php echo $name ?>, from <?php echo "$city, $state" ?> missing since <?php echo strftime("%B %e, %Y", $since); ?>.<br>
<strong>Status:</strong> <?php echo $status ?>.<br>
<strong>If found, please contact</strong> <?php echo $police ?><br>
</div>
</div>
...
</body>
</html>
<?php
}
Add in your own CSS and page header, and you've got missing kids on your 404 page.

The last thing to do is to tell apache to use this script as your 404 handler. To do that, put the page (I call it 404.php) into your document root, and put this into your apache config (or in a .htaccess file):
ErrorDocument 404 /404.php
Restart apache and you're done.

Update: 2010-02-24 To see it in action, visit a missing page on my website. eg: http://bluesmoon.info/foobar.

Update 2: The code is now on github: http://github.com/bluesmoon/404kids

Update: 2010-02-25 Scott Hanselman has a Javascript implementation on his blog.

Update: 2010-03-28 There's now a drupal module for this.

22 comments :

Brian
February 24, 2010 11:13 AM

Can you show us what the result looks like?

Philip
February 24, 2010 12:12 PM

@Brian, thanks for catching that. You can try visiting a missing page on my website. eg: http://bluesmoon.info/foobar

I'll update the post.

Anonymous
February 24, 2010 12:18 PM

When viewing in Firefox 3.5.8 for Ubuntu, the right have of the kid's face is truncated.

chanux
February 24, 2010 12:25 PM

anyway to show more relevant info according to the IP of the visitor? That would be more helpful.

Nice idea though.

Philip
February 24, 2010 12:39 PM

@Anonymous: it should be fixed now. Thanks for the report.

@chanux: yeah, you could do a geo lookup on the IP. There are many services that can tell you the country. Rasmus has a good API at http://geoip.pidgets.com/ that can be used for this. I'll update my page later today to use it.

Unknown
February 24, 2010 12:43 PM

wow, awesome awesome idea. adding the geo ip lookup for a more relevant result based on where the user is would be the icing on the cake.

Nick Fitzgerald
February 24, 2010 12:54 PM

I posted this comment on Hacker News, thought I would repost:

This is a really cool idea, I would like to have a drop in, embeddable <script> widget type thing to add this to my 404 page. I just don't have time to spare to port this code to django/python, a simple script will work on any platform.

Anyone care to spend a little time on this? Possibly cleaning up the design to be more aesthetically pleasing and possibly multiple themes/color schemes?

Again, really cool, I like it. Does some good and is cute at the same time.

mark
February 24, 2010 5:17 PM

Awesome idea! why not get some funding from Childrens charities / ngo's like http://www.unicef.org/ ?

I'm sure larger web agencies would jump at the chance to help you out

It would be great if there where plugins / extensions for Joomla, Drupal, Wordpress etc. Then people who are not backend coders could also do their part to help find missing kids

Eliot Sykes
February 24, 2010 5:38 PM

What do you think about approaching Mozilla, Google, Microsoft, Apple and getting them to make this a default 404 page served by their browsers?

Philip
February 25, 2010 5:18 AM

@eliot: I added the geoip lookup, but it was too slow so I've dropped it for now (commented out, so you can still see it in the source).

@Nick: I can see how a script tag would be useful for some people. I personally prefer not to go that path for two reasons. First, it makes the page inaccessible to people with javascript turned off. This may not be an issue for others, but it is for me. Second, since this is mainly for 404 pages, anyone putting it onto a 404 page would have to know how to edit their 404 page, in which case they can do a lot more than just a script node.

That said, I think it makes a lot of sense to build a reusable widget/badge in javascript that people can just stick onto their wordpress/blogger/movable type blogs. Feel free to work off the code in github.

@mark: I don't require funding. I just need volunteers to help build all these plugins. You volunteering? ;)

@Eliot: It's worth a try.

Sundar
February 25, 2010 5:24 AM

Amazing idea. Must be adopted by the platforms.

dangiankit
February 25, 2010 6:00 AM

I'll reiterate what I've just tweeted.

This idea of yours is amazing. In fact, am amazed to see that there is such a database providing API's for others to connect. Perhaps, that's the strong IT infrastructure that these countries have.

I wish the Indian Police could take efforts to build up an integrated missing kids database, and then using your idea, apply it to all Indian Govt. websites. There sure is lots of scope.

dangiankit
February 25, 2010 6:28 AM

A quick search for Missing Indian Kids lead to http://www.missingindiankids.com/help/javaban.htm

They have Java banners (using Applets), links, logos etc. Perhaps, you could embed the applet into your post. :-)

Davin Studer
February 25, 2010 11:25 AM
This comment has been removed by the author.
Davin Studer
February 25, 2010 11:26 AM

I like the idea! Scott Hanselman has made a completely client side version of your idea.

http://goo.gl/fb/YW9w

Evan Cooper
February 25, 2010 6:48 PM

All the missing kids gonna steal all ma traffic! I don't think so! There's no ROI in finding missing kids! Unless those missing kids are like "Super thx 4 finding me, now go buy something on that site that had my picture on it." Affiliate marketing! Now that's what I'M talking about.

Unknown
February 26, 2010 12:11 PM

Just wanted to let anyone interested know that I put together a js widget version that is available here:

http://www.missingchildren404.org

So now you can easily drop it into your site if you'd like. And everything including images is cached so it should be pretty scalable.

I'm open to any and all feedback at eliot dot k at gmail.

Philip
February 26, 2010 1:04 PM

@eliot: I haven't looked into how you do the geolookup, but I'm assuming you use the full IP address. I was wondering if it would be less stressful on the service if you just used the first three bytes of the IP and zeroed out the fourth. This would effectively reduce the total number of potential IPs from 4 billion to 16 million, which should also limit the number of times you need to call the API.

I'd also pass the API through YQL for caching.

Philip
February 26, 2010 2:06 PM

ok, I've now added a geolookup. It probably needs a separate post to explain how I did it with one API call.

Anonymous
March 08, 2010 7:19 PM
This comment has been removed by the author.
Adrianus Warmenhoven
March 08, 2010 7:20 PM

Heya,

I have taken your idea and modified it for the Dutch Amber Alert.

The Google 'project' can be found at http://code.google.com/p/amberalert404/

And of course I give credit where it is due.

carnaporto
June 25, 2010 2:20 PM

Brilliant idea, would love to see it on some life sites.

Post a Comment

...===...