场馆和其他地理位置的实体解析

发布于 2024-08-21 20:10:26 字数 232 浏览 12 评论 0原文

假设我想构建一个签到聚合器来统计跨平台的访问量,这样我就可以知道某个特定地点有多少人在 Foursquare、Gowalla、BrightKite 等上签到过。是否有一个好的库或工具集我可以开箱即用地将每个服务中的场地条目与我自己的唯一地点标识符相关联吗?

我基本上想要一个可以从一对(地名、地址、纬度/经度)元组映射到 [0,1) 的函数,确信它们引用相同的现实世界位置。

肯定有人已经这样做了,但我的谷歌功能很弱。

Say I want to build a check-in aggregator that counts visits across platforms, so that I can know for a given place how many people have checked in there on Foursquare, Gowalla, BrightKite, etc. Is there a good library or set of tools I can use out of the box to associate the venue entries in each service with a unique place identifier of my own?

I basically want a function that can map from a pair of (placename, address, lat/long) tuples to [0,1) confidence that they refer to the same real-world location.

Someone must have done this already, but my google-fu is weak.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

盛夏尉蓝 2024-08-28 20:10:26

是的,您可以使用 geocoder.net 提交这两个地址(假设您是.Net 开发人员,你没有说)。它提供了一个用于地址验证和地理编码的通用接口,因此您可以合理地确定一个地址与另一个地址相同。

如果您无法使它们标准化和匹配,您可以比较它们的距离,如果它们彼此之间的距离低于某个阈值,则假设它们位于同一位置。

Yes, you can submit the two addresses using geocoder.net (assuming you're a .Net developer, you didn't say). It provides a common interface for address verification and geocoding, so you can be reasonably sure that one address equals another.

If you can't get them to standardize and match, you can compare their distances and assume they are the same place if they are below a certain threshold away from each other.

神妖 2024-08-28 20:10:26

我对已经存在这样的工具持悲观态度。

基于实体解析文献匹配对的一个好的解决方案是

  • 获取地名,定义并使用它们的良好距离函数(例如编辑距离),
  • 获取地址,标准化(例如使用提到的 geocoder.net 工具) ),并定义它们之间的距离,
  • 获取坐标并获取距离(这很容易:有很多用于地理距离计算的库和工具,这似乎是一个很好的度量),
  • 将距离转换为概率(“如果我们假设这些是相同的地方,那么这样的距离的概率是多少”)(不简单),
  • 并结合概率(也不简单)。

那么也许类似闭包的算法(根据高于给定概率阈值的合并对来关闭集合)也可以帮助找到所有匹配(例如,当给定地点积累不同的名称时)。

然而,这不会是一个糟糕的工具或服务。

I'm pessimist that there is such a tool already accessible.

A good solution to match pairs based on the entity resolution literature would be to

  • get the placenames, define and use a good distance function on them (eg. edit distance),
  • get the address, standardize (eg. with the mentioned geocoder.net tools), and also define distance between them,
  • get the coordinates and get a distance (this is easy: there are lots of libraries and tools for geographic distance calculations, and that seems to be a good metric),
  • turn the distances to probabilities ("what is the probability of such a distance, if we suppose these are the same places")(not straightforward),
  • and combine the probabilities (not straightforward also).

Then maybe a closure-like algorithm (close the set according to merging pairs above a given probability treshold) also can help to find all the matchings (for example when different names accumulate for a given venue).

It wouldn't be a bad tool or service however.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文