After a month of testing with many bug fixes and considerable accuracy improvements the RSS to GeoRSS converter is officially released today. There is still a lot of room for improvement and we will continue to work on it, but we think it has now reached a sufficient level of accuracy to be released.
A number of Bloggers have already written about this new service. Here a list of some of them :
ExploreOurPla.net will soon release a GeoRSS viewer on top of their already remarkable interface. The viewer is currently in Beta, but nevertheless the most sophisticated GeoRSS viewer we know about. If you want to check it out, and I am sure you want, just add the RSS feed URL to the path ‘http://exploreourpla.net/georss/’. ExploreOurPla.net will call the geonames converter with the url and display the feed.
Here an example of google news about “flu” : http://exploreourpla.net/georss/http://news.google.com/news?hl=en&output=rss&q=flu
Reuters “World News” is already in the default profile. You can create a login with ExploreOurPla.net and create your own list of your favorite geo feeds.
An important feature for us here at geonames is the feedback menu for each feed entry. There are two options ‘ok’ and ‘nok’ (for “not ok”). A click on one of them will result in a callback to the geonames server. We will store your feedback in our database and use it to improve our conversion algorithm. We hope to be able to officially release the RSS to GeoRSS converter pretty soon. Below a screen shot of the ‘geoFeedExplorer’ I have underlined the feedback menus and marked them with a red arrow.
One of the most interesting parts of the 'natural language geocoder' is Place Name Disambiguation. Depending on the context, the grammatical structure or the language a term may have one of several possible meanings. Examples :
- Hayden : the CIA director Michael Hayden or the city in Idaho.
- Java : the island or the programming language
- Brisbane : city in Australia or city in California, USA
- Como : city in Italy or a very frequent word in Spanish.
We are using several processing steps to tackle this problem. First we identify the language of the text and the contexts (Example : In an IT context the term java most likely stands for the programming language).
Then we try to find person names and we do some simple grammatical analysis using coocurrences of left and right neighbours (Example : If the term we are looking at is preceded by the expression 'south of', we can be nearly certain that the term has a geographical meaning.)
I am currently looking at first names in order to improve the 'natural language geocoder'. It is curious to see that it needs 4200 first names for girls to cover 90% of the population, but it only needs 1200 boy's names to reach a 90% coverage. The reason for this huge difference is mainly found in the top positions. The ten most popular male names reach 23% whereas the ten most popular female names reach a comparatively meager 10%.
The question is: why are parents looking for variety in a name for a girl and far less for a boy? Any ideas?
Source : US Census
PS, just in case you want to know the most popular names :
name freq cum.freq rank
MARY 2.629 2.629 1
PATRICIA 1.073 3.702 2
LINDA 1.035 4.736 3
BARBARA 0.980 5.716 4
ELIZABETH 0.937 6.653 5
JENNIFER 0.932 7.586 6
MARIA 0.828 8.414 7
SUSAN 0.794 9.209 8
MARGARET 0.768 9.976 9
DOROTHY 0.727 10.703 10
JAMES 3.318 3.318 1
JOHN 3.271 6.589 2
ROBERT 3.143 9.732 3
MICHAEL 2.629 12.361 4
WILLIAM 2.451 14.812 5
DAVID 2.363 17.176 6
RICHARD 1.703 18.878 7
CHARLES 1.523 20.401 8
JOSEPH 1.404 21.805 9
THOMAS 1.380 23.185 10
The brand new Geonames RSS-to-GeoRSS-Converter reads each entry in an RSS feed and tries to determine a geo location for the entry using a modified version of the Geonames full text search. If a geo location is found, its latitude and longitude in GeoRSS format are added to the feed entry. It works for any RSS feed, just pass the feed url as a parameter to the converter and your feed entries will get latitude and longitude in real time.
Here the Reuters WorldNews Feed enriched with lat/lng by Geonames and displayed on the Acme GeoRSS Mapper. It works like a unix pipe : the orginal Reuters feed is piped through the Geonames converter and the result is displayed on the Acme Viewer. You could imagine to pipe the result through a translation service, trough a filter service, trough a sort service, merge with an other feed and so on. In the end you have the original feed translated into your language, filtered by some keywords, enriched with geo information and sorted by distance to your home address. I digress, lets go back to our service.
In addition to adding GeoRSS information the converter is also able to convert from one rss dialect to an other RSS dialect. This is useful if the converted feed with the geo information should be displayed with a GeoRSS viewer which does not support the RSS dialect of the original feed. Just add the parameter type to the url with the target RSS dialect.
The converter has to take the language of the feed into account. Supported languages are English, German, Spanish, Italian and French. The automatic language detection of the converter is rudimentary and only uses the top level domain of the feed to determine the language (.de > de, .it > it ,….). The parameter lang can be used to pass a language and circumvent the automatic language detection.
This a very first release for this service and there is a lot of way for improvement. Don't hesitate to use the comments form below for your ideas and feedback.