Place Name Disambiguation

One of the most interesting parts of the 'natural language geocoder' is Place Name Disambiguation. Depending on the context, the grammatical structure or the language a term may have one of several possible meanings. Examples :

  • Hayden : the CIA director Michael Hayden or the city in Idaho.
  • Java : the island or the programming language
  • Brisbane : city in Australia or city in California, USA
  • Como : city in Italy or a very frequent word in Spanish.

We are using several processing steps to tackle this problem. First we identify the language of the text and the contexts (Example : In an IT context the term java most likely stands for the programming language).

Then we try to find person names and we do some simple grammatical analysis using coocurrences of left and right neighbours (Example : If the term we are looking at is preceded by the expression 'south of', we can be nearly certain that the term has a geographical meaning.)


  1. […] What’s going on is that the RSS feed from this blog is being converted to GeoRSS by the Geonames service Geonames examines the feed to see if it can pick out any place names. Should a place be identified, Geonames looks up the latitude and longiotude, then encodes this into the RSS feed. It’s obviously not perfect, a post about Jack London for example would locate the gnarly author in Oxford Street rather than the Klondike, though clever disambiguation technology helps Geonames make an intelligent guess at the context of the place name. Explore Our is helping to improve the natural language geocoder with a feature that lets you pass feedback to the geonames server. […]

