计算地理地图上斑点密度最高区域的高效算法

发布于 2024-10-21 14:35:06 字数 408 浏览 3 评论 0原文

假设我有一张地理地图,其中点由纬度\经度表示。 我在这张地图上有很多点,并且可以随时添加\删除\移动点。

我需要的是,获得“最热点”——包含最多点的区域除以面积——或者换句话说,点密度最高的区域。

我需要一个高效的数据结构以及一个可以在任何变化时重新计算最热点的算法。计算复杂度和存储复杂度必须最小化,因为点数可能会变得非常多。

我希望了解并按降序维护最热点的列表 - 首先是最拥挤的区域,然后是不太拥挤的区域。拥有一个有限大小的列表是可以的 - 例如 100 个最热点。

当然,为了防止一个孤立点出现 100% 的密度,需要有一个最小面积(定义为常数)。

这里“区域”的定义是地图上包含点的任何可感知区域。它可能是整个地图,但算法当然不应该将其视为热点=)

谢谢! 如果需要任何澄清,请说出来......

Let's say I have a geographical map, where points are represented by latitude\longitude.
I have a number of points on this map, and points could be added\deleted\moved at any time.

What I need is, to get the "hottest spots" - the areas that include the highest numbers of points divided by area - or in other words, the areas with the highest density of points.

I need an efficient data structure and also an algorithm that re-calculates the hottest spots on any change. The computational complexity and memory complexity must be minimal, because the number of points could get very high.

I wish to know and maintain a list of the hottest spots in descending order - the most crowdest area first, then the less crowded areas. It's OK to have a list of limited size - for example, the 100 hottest spots.

Of course, to prevent 100% density on one isolated point, there is a minimal area (defined as a constant).

The definition of "area" here is any perceivable area on the map that contains points. It could be the whole map, but the algorithm shouldn't see that as a hot spot of course =)

Thanks ahead!
If it needs any clarification please say so...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

雄赳赳气昂昂 2024-10-28 14:35:06

您正在做的事情在统计上被称为“二维密度估计”(这样您就知道该往哪里看)。

一种常见的方法是“核平滑”。想象一下每个数据点上都有一张光盘。该区域上的内核平滑是每个点重叠的圆盘的数量。这是使用固定半径的均匀内核的内核平滑。

您不必使用统一的内核,也不必在所有点上具有相同的大小 - 这现在是“自适应带宽内核平滑”。

这为您提供了一个可以轻松更新的网格,特别是如果您有一个有限的内核(您可以使用无限的高斯(又名普通)内核,但会被剪切到您的研究区域)。每次添加或删除一个点时,您都​​会添加或删除其内核对密度的贡献。

这就是问题的一半——你现在有了一个密度网格。然后,您可以使用聚类算法对其进行分区,并根据 a) 定义“峰值”和 b) 将其定义为与相邻峰值分开的任何标准找到单独的峰值。其他人已经提出了聚类算法。我(十年前)使用统计软件包“R”中的聚类函数完成了这项工作。不过,速度不是它的强项...

您可能想将其移交给 http://stats.stackexchange.com

What you are doing is statistically known as '2-dimensional density estimation' (so you know where to look).

One common method is 'kernel smoothing'. Imagine a disc sitting on each of your data points. The kernel smoothing over the area is the number of discs overlapping at each point. That's a kernel smoothing using a uniform kernel of fixed radius.

You don't have to use uniform kernels, nor do they have to be the same size at all points - this is now "adaptive bandwidth kernel smoothing".

That gives you a grid which you can update pretty easily, especially if you have a finite kernel (you can use a Gaussian (aka Normal) kernel which is infinite, but would be clipped to your study area). Every time you add or take away a point, you add or take away its kernel's contribution to the density.

So that's half the problem - you now have a grid of densities. Then you can use clustering algorithms to partition it and find separate peaks according to whatever criteria a) defines a 'peak' and b) defines it as separate from an adjacent peak. Someone else has already suggested clustering algorithms. I have done this (ten years ago) using clustering functions in "R", the statistical package. Speed isn't its strong point though...

You might want to take this over to http://stats.stackexchange.com

蓝海似她心 2024-10-28 14:35:06

对于评论来说这太长了,但这里只是一个关于我如何“玩”这个的想法,看看我是否能想出一些有趣的东西。但有一件事是肯定的:以下操作可以变得非常快。

这可以很容易地转化为一些离散问题吗?您首先将所有坐标“对齐”到一张大地图(您定义每个方块有多大,并将每个条目地图设置为一个这样的点)。然后你会得到这样的结果:

0000000000000000000000000000
00XX000000000000000000X00000
00X00000000000000X0000000000
0000000000000000000000000000
0000000000000000000000000000
000000X00000000X000000000000
0000000000000000000000000000
000000000000X000000000X00000
00000000000000000000000X0000
0000000000000000000000000000

然后你会计算每个条目及其相邻邻居的数量:

0000000000000000000000000000
0033000000000000000001000000
0030000000000000010000000000
0000000000000000000000000000
0000000000000000000000000000
0000001000001001000000000000
0000000000000000000000000000
0000000000001010000000200000
0000000000000000000000020000
0000000000000100000000000000

然后你可以增加你的正方形的大小,比如说乘以二,从而划分你的地图:(

地图是'不正确,这只是为了说明我在想什么)

09001001000000
00000000000000
00100001100000
00000110002000
00000002000000
00000100000000

然后你重新计算相邻的邻居等。

对我来说,这将允许根据你的“分辨率”找到热点:你只需寻找最大的数字,那就是你的“热点”。

因为在这种情况下:

0000X00000
0000XX0000
0000000000
0000000000
0Y0Y000000
0000000000
0Y0Y000000

“X”可以是最热点(三个彼此接近的有趣点)或“Y”(四个彼此接近的点,但它们不如“X”那么接近)。

因为你说你需要速度,所以我只是把它变成一个离散问题并将我的图表表示为数组。然后我允许可变的“区域”大小。

看起来是一个有趣的问题:)

This is too long for a comment but here's just an idea as to how I'd "play" with this and see if I can come up with something interesting. But one thing is sure: the following can be made to be very fast.

Can this be easily translated to some discrete problem? You'd first "align" all your coordinates to a big map (you define how much each square is big and you make every entry map to one such point). You'd then end up with something like this:

0000000000000000000000000000
00XX000000000000000000X00000
00X00000000000000X0000000000
0000000000000000000000000000
0000000000000000000000000000
000000X00000000X000000000000
0000000000000000000000000000
000000000000X000000000X00000
00000000000000000000000X0000
0000000000000000000000000000

Then you'd compute each entry and it's number of adjacent neighbours:

0000000000000000000000000000
0033000000000000000001000000
0030000000000000010000000000
0000000000000000000000000000
0000000000000000000000000000
0000001000001001000000000000
0000000000000000000000000000
0000000000001010000000200000
0000000000000000000000020000
0000000000000100000000000000

Then you can augment the size of your square, say by two, and hence divide your map:

(the map ain't correct, it's jut to give an idea of what I'm thinking about)

09001001000000
00000000000000
00100001100000
00000110002000
00000002000000
00000100000000

Then you re-count the adjacent neighbours, etc.

To me this would allow to find hotspot depending on your "resolution": you'd simply look for the biggest numbers and that would be your "hotspots".

Because in this case:

0000X00000
0000XX0000
0000000000
0000000000
0Y0Y000000
0000000000
0Y0Y000000

Either 'X' can be the hottest spot (three interesting point close to each other) or 'Y' (four points close to each other, but they're not as close than for the 'X').

Because you said you needed speed, I'd just turn this into a discrete problem and represent my graphs as arrays. I'd then allow for a variable "area" size.

Looks like a fun problem to work on :)

吲‖鸣 2024-10-28 14:35:06

使用的算法类型取决于点的分布。您很可能会遇到一个更困难的问题,即将组划分为“区域”。由于您似乎需要一些东西来开始,我建议您阅读 凸包计算任意二维多边形的面积 和如何确定点是否在多边形中

The type of algorithm used will depend on the distribution of points. You could very well run into a much harder problem of group partitioning into "areas." Since you seem to need something to get you started, I recommend you read up on Convex Hulls and calculating the area of an abritray 2D polygon and how to determine if a point is in a polygon.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文