场馆和其他地理位置的实体解析
假设我想构建一个签到聚合器来统计跨平台的访问量,这样我就可以知道某个特定地点有多少人在 Foursquare、Gowalla、BrightKite 等上签到过。是否有一个好的库或工具集我可以开箱即用地将每个服务中的场地条目与我自己的唯一地点标识符相关联吗?
我基本上想要一个可以从一对(地名、地址、纬度/经度)元组映射到 [0,1) 的函数,确信它们引用相同的现实世界位置。
肯定有人已经这样做了,但我的谷歌功能很弱。
Say I want to build a check-in aggregator that counts visits across platforms, so that I can know for a given place how many people have checked in there on Foursquare, Gowalla, BrightKite, etc. Is there a good library or set of tools I can use out of the box to associate the venue entries in each service with a unique place identifier of my own?
I basically want a function that can map from a pair of (placename, address, lat/long) tuples to [0,1) confidence that they refer to the same real-world location.
Someone must have done this already, but my google-fu is weak.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,您可以使用 geocoder.net 提交这两个地址(假设您是.Net 开发人员,你没有说)。它提供了一个用于地址验证和地理编码的通用接口,因此您可以合理地确定一个地址与另一个地址相同。
如果您无法使它们标准化和匹配,您可以比较它们的距离,如果它们彼此之间的距离低于某个阈值,则假设它们位于同一位置。
Yes, you can submit the two addresses using geocoder.net (assuming you're a .Net developer, you didn't say). It provides a common interface for address verification and geocoding, so you can be reasonably sure that one address equals another.
If you can't get them to standardize and match, you can compare their distances and assume they are the same place if they are below a certain threshold away from each other.
我对已经存在这样的工具持悲观态度。
基于实体解析文献匹配对的一个好的解决方案是
那么也许类似闭包的算法(根据高于给定概率阈值的合并对来关闭集合)也可以帮助找到所有匹配(例如,当给定地点积累不同的名称时)。
然而,这不会是一个糟糕的工具或服务。
I'm pessimist that there is such a tool already accessible.
A good solution to match pairs based on the entity resolution literature would be to
Then maybe a closure-like algorithm (close the set according to merging pairs above a given probability treshold) also can help to find all the matchings (for example when different names accumulate for a given venue).
It wouldn't be a bad tool or service however.