Motivation

Geocaching1 is all about finding things. Sometimes they are hard to find because they've been hidden cleverly whilst at other times one has to solve a puzzle before starting to look. I'm a big fan of both sorts of caches, but sometimes there's a third problem which I find less enjoyable to tackle: the geocache description might be in a language I don't understand.

Resolution

Groundspeak offer a Pocket Query service to premium subscribers which lets them specify a search e.g. caches near the centre of Prague. A list of caches meeting these criteria is then encoded in GPX (a XML application). This file can be viewed with e.g. Google Earth, and a caching extravaganza planned.

However it's a bit tricky to plan the trip if many of the caches are unintelligible to me. It would be much more convenient if I could process the GPX file automatically, selecting only those caches which have a reasonable amount of English in their description.

Happily, and not entirely coincidentally, I recently wrote about a toy language model2 which is fine for solving this sort of problem.

Software

You can download some Perl3 which does all this, at least if the only languages in play are English, French, German, Spanish, Italian, and Czech.

For example, here's a demo:

$ wget http://www.mjoldfield.com/atelier/2010/06/glf/gc-lang-filter_0.1.tar.gz
$ tar xzvf gc-lang-filter_0.1.tar.gz
$ cd gc-lang-filter-0.1
$ ./gc-lang-filter Prague.gpx
$ # only on Mac OS
$ open -a 'Google Earth' Prague-en.gpx

Results

Starting from 1,000 caches in Prague and selecting those with more than a quarter of the description in English produced a list of 261—changing the threshold changes this a bit but not too much.

All 261 caches seem to be worth investigating, though of course whether I actually find any remains to be seen!