识别文本中的地理位置
已经做了哪些工作来确定特定字符串是否属于某个地理位置? 例如:
'troy, ny'
'austin, texas'
'hotels in las vegas, nv'
我想我所期待的是一种统计方法,它可以在一定程度上确信前两个是位置。 最后一个可能需要一种启发式方法来获取“%s,%s”,然后使用相同的技术。 我特别寻找不太依赖“in”命题的方法,因为它不是一个完全明确或始终可用的位置指示器。
谁能给我指出一些方法、论文或现有的实用程序? 谢谢!
What kind of work has been done to determine whether a specific string pertains to a geographical location? For example:
'troy, ny'
'austin, texas'
'hotels in las vegas, nv'
I guess what I'm sort of expecting is a statistical approach that gives a degree of confidence that the first two are locations. The last one would probably require a heuristic which grabs "%s, %s" and then uses the same technique. I'm specifically looking for approaches that don't rely too heavily on the proposition 'in', seeing as it's not an entirely unambiguous or consistently available indicator of location.
Can anyone point me to approaches, papers, or existing utilities? Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您描述的问题通常称为地理查询解析或更一般的地理信息检索。
最近在 CLEF 2007 上有一项关于执行此操作的任务 (http://www .uni-hildesheim.de/geoclef/2007/Query-Parsing.htm)。 获胜团队使用了基于规则的语法,这与您可能不想要的类似。 www2009 上的另一篇论文讨论了 GeoParser:http://www2009.eprints.org/239/。
CIKM 2007 上也有一些关于地理信息检索的论文: http://www .geo.unizh.ch/~rsp/gir07/accepted.html
我不知道有任何开源软件可以做到这一点,但它可能会捆绑到像 Lemur 这样的搜索引擎中。
The problem you describe is often called geographic query parsing or more generally geographic information retrieval.
There was a recent task on doing this at CLEF 2007 (http://www.uni-hildesheim.de/geoclef/2007/Query-Parsing.htm). The winning team used a rule based grammar, which is similar to what you probably don't want. Another paper at www2009 talks about GeoParser: http://www2009.eprints.org/239/.
There are also some papers on Geographic Information Retrieval at CIKM 2007: http://www.geo.unizh.ch/~rsp/gir07/accepted.html
I don't know of any open source software that does this, but it may be bundled into a search engine like Lemur.
Everyblock.com 采用了一种非常有趣的方法,重点关注如何用英语表达位置——它们基本上使用一些复杂且广泛的正则表达式,这些正则表达式现在已经开源。 他们的应用程序旨在扫描新闻文章、评论和各种公共数据源,并将它们与特定位置相关联,而且效果很好。 像“旧金山 20 街和瓦伦西亚街东北角的建筑物发生火灾”这样的表达方式的地理编码非常准确。 您可以在此处研究源代码。 您可能想要的特定部分是
ebpub/ebpub/geocoder/base.py
,位于ebpub
下载中,以及它周围的所有内容,例如从 SmartGeocoder 类开始,向后工作。There is a very interesting approach taken by Everyblock.com that is focused on how locations are expressed in English -- they basically use some sophisticated and extensive regular expressions that are now open source. Their application is designed to scan through news articles, reviews, and various public data feeds and relate them to specific locations, and it works well. Expressions like "A fire in the building on the North-East corner of 20th and Valencia St. in San Francisco" are very accurately geocoded. You can study the source here. The particular part you probably want is
ebpub/ebpub/geocoder/base.py
, located in theebpub
download, and everything around it, for example starting with the SmartGeocoder class and working backwards.帮助链接:geonames.org 搜索:
示例: http://ws.geonames .org/search?q=troy,%20ny&maxRows=10
A link to help: geonames.org search:
example: http://ws.geonames.org/search?q=troy,%20ny&maxRows=10
我正在 geocode.xyz 构建一个免费的地理解析器
(目前支持约 50 个欧洲国家,很快将提供全球覆盖)
地理解析的示例应用程序可以在 OpenWikiMap 上找到
I'm building a free geoparser at geocode.xyz
(currently supports about 50 European countries, soon to offer global coverage)
A sample application of geoparsing can be found on OpenWikiMap