空间划分算法
我有一组包含在矩形内的点。我想根据点密度将矩形分割成子矩形(给出多个子矩形或所需的密度,以最简单的为准)。
分区不必是精确的(几乎任何比常规网格更好的近似都会做),但算法必须处理大量的点 - 大约。 2亿。然而,所需的子矩形数量要少得多(大约 1000)。
有谁知道任何算法可以帮助我完成这个特定的任务?
I have a set of points which are contained within the rectangle. I'd like to split the rectangles into subrectangles based on point density (giving a number of subrectangles or desired density, whichever is easiest).
The partitioning doesn't have to be exact (almost any approximation better than regular grid would do), but the algorithm has to cope with the large number of points - approx. 200 millions. The desired number of subrectangles however is substantially lower (around 1000).
Does anyone know any algorithm which may help me with this particular task?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
只是为了了解问题。
下面的内容很粗糙,表现很差,但我想知道结果是不是你想要的>
假设>矩形数量为偶数
假设>点分布明显是二维的(一行中没有大的积累)
程序>
在任一轴上平分 n/2 次,从每个先前确定的矩形的一端循环到另一端,计数“通过”点并存储每次迭代时通过的点的数量。计数完毕后,将矩形一分为二,选择每个循环中计数的点。
这是您想要实现的目标吗?
Just to understand the problem.
The following is crude and perform badly, but I want to know if the result is what you want>
Assumption> Number of rectangles is even
Assumption> Point distribution is markedly 2D (no big accumulation in one line)
Procedure>
Bisect n/2 times in either axis, looping from one end to the other of each previously determined rectangle counting "passed" points and storing the number of passed points at each iteration. Once counted, bisect the rectangle selecting by the points counted in each loop.
Is that what you want to achieve?
我想我会从以下内容开始,这与 @belisarius 已经提出的内容很接近。如果您有任何其他要求,例如更喜欢“接近正方形”的矩形而不是“又长又薄”的矩形,您将需要修改这种简单的方法。为了简单起见,我假设这些点近似随机分布。
我希望这足以很好地概述该提案。它有局限性:它会生成一些等于 2 的幂的矩形,因此如果这还不够好,请调整它。我已经用递归的方式表达了它,但它对于并行化来说是理想的。每次分割都会创建两个任务,每个任务分割一个矩形并再创建两个任务。
如果您不喜欢这种方法,也许您可以从常规网格开始,其中包含您想要的矩形数量的倍数(也许是 10 - 100 个)。计算每个小矩形中的点数。然后开始将小矩形粘合在一起,直到较小的矩形包含(大约)正确数量的点。或者,如果它足够好地满足您的要求,您可以使用它作为离散化方法并将其与我的第一种方法集成,但仅将切割线沿着小矩形的边界放置。这可能会快得多,因为您只需对每个小矩形中的点进行一次计数。
我还没有真正考虑过其中任何一个的运行时间;我更喜欢前一种方法,因为我进行了大量的并行编程并且拥有大量的处理器。
I think I'd start with the following, which is close to what @belisarius already proposed. If you have any additional requirements, such as preferring 'nearly square' rectangles to 'long and thin' ones you'll need to modify this naive approach. I'll assume, for the sake of simplicity, that the points are approximately randomly distributed.
I hope that outlines the proposal well enough. It has limitations: it will produce a number of rectangles equal to some power of 2, so adjust it if that's not good enough. I've phrased it recursively, but it's ideal for parallelisation. Each split creates two tasks, each of which splits a rectangle and creates two more tasks.
If you don't like that approach, perhaps you could start with a regular grid with some multiple (10 - 100 perhaps) of the number of rectangles you want. Count the number of points in each of these tiny rectangles. Then start gluing the tiny rectangles together until the less-tiny rectangle contains (approximately) the right number of points. Or, if it satisfies your requirements well enough, you could use this as a discretisation method and integrate it with my first approach, but only place the cutting lines along the boundaries of the tiny rectangles. This would probably be much quicker as you'd only have to count the points in each tiny rectangle once.
I haven't really thought about the running time of either of these; I have a preference for the former approach 'cos I do a fair amount of parallel programming and have oodles of processors.
我认为您正在寻找标准的 Kd 树或二元空间划分树。 (您可以在维基百科上查找。)
由于您有很多点,您可能希望仅大致划分前几个级别。在这种情况下,您应该从 200M 点(也许是 200k 个)中随机抽取样本,并在子样本的中点(沿较长的轴)分割整个数据集。如果您实际上随机选择点,那么您错过大量需要细分的点的概率将大约为零。
现在你有两个问题,每个问题大约有 100M 点。沿长轴将每个分开。重复此操作,直到停止提取子样本并沿整个数据集进行分割。经过十次广度优先迭代后,您就完成了。
如果你有一个不同的问题 - 你必须沿着 X 和 Y 轴提供刻度线并尽可能地沿着这些刻度填充网格,而不是对 Kd 树进行不规则分解 - 获取点的子样本并沿每个轴查找 0/32、1/32、...、32/32 百分位数。在那里绘制网格线,然后用您的点填充生成的 1024 个元素的网格。
You're after a standard Kd-tree or binary space partitioning tree, I think. (You can look it up on Wikipedia.)
Since you have very many points, you may wish to only approximately partition the first few levels. In this case, you should take a random sample of your 200M points--maybe 200k of them--and split the full data set at the midpoint of the subsample (along whichever axis is longer). If you actually choose the points at random, the probability that you'll miss a huge cluster of points that need to be subdivided will be approximately zero.
Now you have two problems of about 100M points each. Divide each along the longer axis. Repeat until you stop taking subsamples and split along the whole data set. After ten breadth-first iterations you'll be done.
If you have a different problem--you must provide tick marks along the X and Y axis and fill in a grid along those as best you can, rather than having the irregular decomposition of a Kd-tree--take your subsample of points and find the 0/32, 1/32, ..., 32/32 percentiles along each axis. Draw your grid lines there, then fill the resulting 1024-element grid with your points.
R-tree
R-tree
好问题。
我认为你需要研究的领域是“计算几何”和“k-划分”问题。 此处有一个链接可能会帮助您开始,
您可能会发现问题本身是 NP -hard 这意味着一个好的近似算法是你能得到的最好的。
Good question.
I think the area you need to investigate is "computational geometry" and the "k-partitioning" problem. There's a link that might help get you started here
You might find that the problem itself is NP-hard which means a good approximation algorithm is the best you're going to get.
K-means 聚类 或 Voronoi 图 非常适合您要解决的问题?
Would K-means clustering or a Voronoi diagram be a good fit for the problem you are trying to solve?
这看起来像集群分析。
That's looks like Cluster analysis.
QuadTree 可以工作吗?
Would a QuadTree work?