如何对“接近”的纬度/经度点进行分组彼此?

发布于 2024-10-05 18:12:22 字数 575 浏览 4 评论 0原文

我有一个用户提交的纬度/经度点的数据库,并且正在尝试将“接近”点分组在一起。 “接近”是相对的,但目前看来约为 500 英尺。

起初,我似乎只能按前 3 个小数位具有相同纬度/经度的行进行分组(大约是一个 300x300 的盒子,了解当您远离赤道时它会发生变化)。

但这种方法似乎还很欠缺。 “接近度”不能与每个小数位代表的距离有显着差异。它没有考虑到两个位置在小数点后第三位(或任意位)可能有不同的数字,但仍然在该位置表示的距离内(33.123933.1240)。

我还考虑过 A 点和 C 点都“接近”B 点(但彼此不接近)的情况 - 它们是否应该组合在一起?如果是这样,当 D 点“接近”C 点(并且没有其他点)时会发生什么 - 它也应该被分组。当然,我必须确定所需的行为,但是如何实现呢?

任何人都可以为我指出正确的方向,告诉我如何做到这一点以及可以使用哪些不同的方法/途径?

我觉得我有点错过了一些明显的东西。

目前数据是MySQL数据库,由PHP应用程序使用;然而,如果其他存储方法是实现这一目标的关键部分,我对它们持开放态度。这里。

I have a database of user submitted latitude/longitude points and am trying to group 'close' points together. 'Close' is relative, but for now it seems to ~500 feet.

At first it seemed I could just group by rows that have the same latitude/longitude for the first 3 decimal places (roughly a 300x300 box, understanding that it changes as you move away from the equator).

However, that method seems to be quite lacking. 'Closeness' can't be significantly different than the distance each decimal place represents. It doesn't take into account that two locations may have different digits in the 3rd (or any) decimal place, but still be within the distance that place represents (33.1239 and 33.1240).

I've also mulled over the situation where Point A, and Point C are both 'close' to Point B (but not each other) - should they be grouped together? If so, what happens when Point D is 'close' to point C (and no other points) - should it be grouped as well. Certainly I have to determine the desired behavior, but how would either be implemented?

Can anyone point me in the right direction as to how this can be done and what different methods/approaches can be used?

I feel a bit like I'm missing something obvious.

Currently the data is an a MySQL database, use by a PHP application; however, I'm open to other storage methods if they're a key part in accomplishing this. here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

浊酒尽余欢 2024-10-12 18:12:22

有多种方法可以确定两点之间的距离,但为了在二维图形上绘制点,您可能需要 欧几里得距离。如果 (x1, y1) 代表您的第一个点,而 (x2, y2) 代表您的第二个点,则距离

d = sqrt( (x2-x1)^2 + (y2-y1)^2 )

关于分组,您可能需要使用某种 2- D 表示确定事物彼此之间的“接近”程度。例如,如果您有三个点:(x1, y1)(x2, y2)(x3, y3),您可以通过简单平均找到这三个点的中心:

x(mean) = (x1+x2+x3)/3
y(mean) = (y1+y2+y3)/3

然后您可以查看每个点与中心的距离,以确定它是否应该成为“簇”的一部分。


定义聚类的方法有很多种,所有这些方法都使用聚类的某种变体算法。我现在很着急,没有时间总结,但请查看链接和算法,希望其他人能够提供更多详细信息。祝你好运!

There are a number of ways of determining the distance between two points, but for plotting points on a 2-D graph you probably want the Euclidean distance. If (x1, y1) represents your first point and (x2, y2) represents your second, the distance is

d = sqrt( (x2-x1)^2 + (y2-y1)^2 )

Regarding grouping, you may want to use some sort of 2-D mean to determine how "close" things are to each other. For example, if you have three points, (x1, y1), (x2, y2), (x3, y3), you can find the center of these three points by simple averaging:

x(mean) = (x1+x2+x3)/3
y(mean) = (y1+y2+y3)/3

You can then see how close each is to the center to determine whether it should be part of the "cluster".


There are a number of ways one can define clusters, all of which use some variant of a clustering algorithm. I'm in a rush now and don't have time to summarize, but check out the link and the algorithms, and hopefully other people will be able to provide more detail. Good luck!

柒七 2024-10-12 18:12:22

使用与您在问题中概述的方法类似的方法来获得一组近似结果,然后通过进行适当的计算来缩小该近似结果。如果您正确选择网格大小(即对坐标进行四舍五入的程度),您至少可以希望将要完成的工作量减少到可接受的水平,尽管您必须管理网格大小。

例如,PostgreSQL 的 Earthdistance 扩展的工作原理是将纬度/经度对转换为 (x,y,z) 笛卡尔坐标,将地球建模为均匀的球体。 PostgreSQL 有一个复杂的索引系统,允许将这些坐标或它们周围的框索引到 R 树中,但是您可以将一些东西组合在一起,即使没有这些系统,仍然有用。

如果你取你的(x,y,z)三元组并四舍五入——即乘以某个因子并截断为整数——那么你就得到了三个整数,你可以将它们连接起来产生一个“盒子名称”,它标识你的“盒子”中的一个盒子 y

如果您想搜索某个目标点 X 公里内的所有点,您将生成该点周围的所有“框名称”(一旦您将目标点转换为 (x, ,z)也是三倍,这很容易)并消除所有不与地球表面相交的盒子(欺骗,但使用x^2+y^2+z^2=R^2每个角落的代码>公式会告诉你)你最终会得到一个目标点可以在的盒子列表 - 所以只需搜索与这些盒子之一匹配的所有点,这也会给你带来一些额外的分数。因此,作为最后阶段,您需要计算到目标点的实际距离并消除一些距离(同样,可以通过使用笛卡尔坐标并将目标大圆距离半径转换为割线距离来加速这一过程)。

摆弄归根结底是为了确保你不必搜索太多的盒子,但同时也不要带来太多的额外分数。我发现在几个不同的网格上对每个点进行索引(例如,1Km、5Km、25Km、125Km 等分辨率)很有用。理想情况下,您只想搜索一个框,请记住,一旦您的目标半径超过网格大小,它就会扩展到至少 27。

我使用这种技术通过 Lucene 构建空间索引,而不是在 SQL 数据库中进行计算。它确实有效,尽管设置它需要一些摆弄,并且索引需要一段时间才能生成并且非常大。使用 R 树来保存所有坐标是一种更好的方法,但需要更多的自定义编码 - 这种技术基本上只需要快速哈希表查找(因此可能适用于所有 NoSQL 数据库)最近很流行,并且应该也可以在 SQL 数据库中使用)。

Use something similar to the method you outlined in your question to get an approximate set of results, then whittle that approximate set down by doing proper calculations. If you pick your grid size (i.e. how much you round off your co-ordinates) correctly, you can at least hope to reduce the amount of work to be done to an acceptable level, although you have to manage what that grid size is.

For example, the earthdistance extension to PostgreSQL works by converting lat/long pairs to (x,y,z) cartesian co-ordinates, modelling the Earth as a uniform sphere. PostgreSQL has a sophisticated indexing system that allows these co-ordinates, or boxes around them, to be indexed into R-trees, but you can whack something together that is still useful without that.

If you take your (x,y,z) triple and round off- i.e. multiply by some factor and truncate to integer- you then have three integers that you can concatenate to produce a "box name", which identifies a box in your "grid" that the point is in.

If you want to search for all points within X km of some target point, you generate all the "box names" around that point (once you've converted your target point to an (x,y,z) triple as well, that's easy) and eliminate all the boxes that don't intersect the Earth's surface (tricker, but use of the x^2+y^2+z^2=R^2 formula at each corner will tell you) you end up with a list of boxes target points can be in- so just search for all points matching one of those boxes, which will also return you some extra points. So as a final stage you need to calculate the actual distance to your target point and eliminate some (again, this can be sped up by working in Cartesian co-ordinates and converting your target great-circle distance radius to secant distance).

The fiddling around comes down to making sure you don't have to search too many boxes, but at the same time don't bring in too many extra points. I've found it useful to index each point on several different grids (e.g. resolutions of 1Km, 5Km, 25Km, 125Km etc). Ideally you want to be searching just one box, remember it expands to at least 27 as soon as your target radius exceeds your grid size.

I've used this technique to construct a spatial index using Lucene rather than doing calculations in a SQL databases. It does work, although there is some fiddling to set it up, and the indices take a while to generate and are quite big. Using an R-tree to hold all the co-ordinates is a much nicer approach, but would take more custom coding- this technique basically just requires a fast hash-table lookup (so would probably work well with all the NoSQL databases that are the rage these days, and should be usable in a SQL database too).

南风起 2024-10-12 18:12:22

也许有点过头了,但在我看来,这是一个聚类问题:距离measure 将确定如何计算两个元素的相似度。如果您需要一个不太简单的解决方案,请尝试数据挖掘:实用的机器学习工具和技术,并使用Weka橙色

Maybe overkill, but it seems to me a clustering problem: distance measure will determine how the similarity of two elements is calculated. If you need a less naive solution try Data Mining: Practical Machine Learning Tools and Techniques, and use Weka or Orange

橘虞初梦 2024-10-12 18:12:22

如果我要解决这个问题,我会从网格开始。将每个点放入网格上的一个正方形中。寻找人口密集的网格。如果相邻的网格没有填充,那么你就有了一个不错的组。

如果您有相邻的密集网格,您始终可以在每个网格的中心放置一个圆圈,并针对圆圈面积与(圆圈中的点数 * 一些可调权重)进行优化。并不完美,但很容易。更好的分组是更复杂的优化问题。

If I were tackling it, I'd start with a grid. Put each point into a square on the grid. Look for grids that are densely populated. If the adjacent grids aren't populated, then you have a decent group.

If you have adjacent densely populated grids, you can always drop a circle at the center of each grid and optimize for circle area vs (number of points in the circle * some tunable weight). Not perfect, but easy. Better groupings are much more complicated optimization problems.

一张白纸 2024-10-12 18:12:22

面对类似的问题,我只是地板经度和纬度,直到获得所需的“接近度”(以米为单位)。就我而言,当位置大约为 4 位数时,我可以对位置进行分组。相距13米。

如果长或纬度是负数 - 用ceil替换floor

首先FLOOR(或CEIL)达到所需的精度,然后对四舍五入的长和纬度进行GROUP。

测量两个地理位置之间距离的代码借自 获取距离基于纬度/经度的两点之间的

from math import sin, cos, sqrt, atan2, radians

R = 6373.0
lat1 = radians(48.71953)
lon1 = radians(-73.72882)
lat2 = radians(48.719)
lon2 = radians(-73.728)
    
dlon = lon2 - lon1
dlat = lat2 - lat1

a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))

distance = (R * c)*1000

print("Distance in meters:", round(distance))

距离(以米为单位):84

正如预期的那样,对于相同的角度,南部的距离较大,北部的距离较小。
对于相同的坐标,但在赤道上,距离为 109 米(将纬度修改为 0.71953 和 0.719)。

我修改了下面的位数,并且始终在经度上单击一下,在纬度上单击一下,并测量了结果距离:

lat1 = radians(48.71953)
lon1 = radians(-73.72882)
lat2 = radians(48.71954)
lon2 = radians(-73.72883)
Distance in meters  1

lat1 = radians(48.7195)
lon1 = radians(-73.7288)
lat2 = radians(48.7196)
lon2 = radians(-73.7289)
Distance in meters  13

lat1 = radians(48.719)
lon1 = radians(-73.728)
lat2 = radians(48.720)
lon2 = radians(-73.729)
Distance in meters  133

lat1 = radians(48.71)
lon1 = radians(-73.72)
lat2 = radians(48.72)
lon2 = radians(-73.73)
Distance in meters  1333

摘要:地板/天花板的经度和纬度为 4 位数字,将帮助您对位置进行分组相距约 13 米。
这个数字根据上面的方程而变化:赤道附近较大,北部较小。

Facing a similar issue, I've just floor the Longitude and Latitude until I got the required 'closeness' in meters. In my case, floor to 4 digits got me locations grouped when they are approx. 13 meters apart.

If the Long or Lat are negatives - replace floor with ceil

First FLOOR (or CEIL) to required precision and then GROUP on the rounded long and lat.

The code to measure distance between two geo locations was borrowed from Getting distance between two points based on latitude/longitude

from math import sin, cos, sqrt, atan2, radians

R = 6373.0
lat1 = radians(48.71953)
lon1 = radians(-73.72882)
lat2 = radians(48.719)
lon2 = radians(-73.728)
    
dlon = lon2 - lon1
dlat = lat2 - lat1

a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))

distance = (R * c)*1000

print("Distance in meters:", round(distance))

Distance in meters: 84

As expected, the distance is larger in the south, and smaller in the north - for the same angle.
For the same coordinates, but on the equator, the distance is 109 meters (modify the latitudes to 0.71953 and 0.719).

I modified the number of digits in the following and always kept one-click on Long and one on Lats, and measured the resulting distances:

lat1 = radians(48.71953)
lon1 = radians(-73.72882)
lat2 = radians(48.71954)
lon2 = radians(-73.72883)
Distance in meters  1

lat1 = radians(48.7195)
lon1 = radians(-73.7288)
lat2 = radians(48.7196)
lon2 = radians(-73.7289)
Distance in meters  13

lat1 = radians(48.719)
lon1 = radians(-73.728)
lat2 = radians(48.720)
lon2 = radians(-73.729)
Distance in meters  133

lat1 = radians(48.71)
lon1 = radians(-73.72)
lat2 = radians(48.72)
lon2 = radians(-73.73)
Distance in meters  1333

Summary: Floor / Ceil the longitude and latitude to 4 digits, will help you group on locations that are approximately 13 meters apart.
This number changes depending on the above equation: larger near the equator and smaller in the north.

空城之時有危險 2024-10-12 18:12:22

如果您正在考虑纬度和经度,则实时数据中需要考虑几个因素:障碍物(例如河流和湖泊)以及设施(例如桥梁和隧道)。你不能简单地将它们分组;如果您使用简单的算法,因为 k 意味着您将无法对它们进行分组。我认为你应该选择空间聚类方法作为分区 CLARANS 方法。

If you are considering latitude and longitude there are several factors to be considered in real time data: obstructions, such as rivers and lakes, and facilities, such as bridges and tunnels. You cannot group them simply; if you use the simple algorithm as k means you will not be able to group them. I think you should go for the spatial clustering methods as partitioning CLARANS method.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文