空间划分算法

发布于 2024-09-04 03:20:24 字数 175 浏览 3 评论 0原文

我有一组包含在矩形内的点。我想根据点密度将矩形分割成子矩形(给出多个子矩形或所需的密度,以最简单的为准)。

分区不必是精确的(几乎任何比常规网格更好的近似都会做),但算法必须处理大量的点 - 大约。 2亿。然而,所需的子矩形数量要少得多(大约 1000)。

有谁知道任何算法可以帮助我完成这个特定的任务?

I have a set of points which are contained within the rectangle. I'd like to split the rectangles into subrectangles based on point density (giving a number of subrectangles or desired density, whichever is easiest).

The partitioning doesn't have to be exact (almost any approximation better than regular grid would do), but the algorithm has to cope with the large number of points - approx. 200 millions. The desired number of subrectangles however is substantially lower (around 1000).

Does anyone know any algorithm which may help me with this particular task?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

够钟 2024-09-11 03:20:24

只是为了了解问题。
下面的内容很粗糙,表现很差,但我想知道结果是不是你想要的>

假设>矩形数量为偶数
假设>点分布明显是二维的(一行中没有大的积累)

程序>
在任一轴上平分 n/2 次,从每个先前确定的矩形的一端循环到另一端,计数“通过”点并存储每次迭代时通过的点的数量。计数完毕后,将矩形一分为二,选择每个循环中计数的点。

这是您想要实现的目标吗?

Just to understand the problem.
The following is crude and perform badly, but I want to know if the result is what you want>

Assumption> Number of rectangles is even
Assumption> Point distribution is markedly 2D (no big accumulation in one line)

Procedure>
Bisect n/2 times in either axis, looping from one end to the other of each previously determined rectangle counting "passed" points and storing the number of passed points at each iteration. Once counted, bisect the rectangle selecting by the points counted in each loop.

Is that what you want to achieve?

拧巴小姐 2024-09-11 03:20:24

我想我会从以下内容开始,这与 @belisarius 已经提出的内容很接近。如果您有任何其他要求,例如更喜欢“接近正方形”的矩形而不是“又长又薄”的矩形,您将需要修改这种简单的方法。为了简单起见,我假设这些点近似随机分布。

  1. 用一条与矩形短边平行并恰好穿过中点的线将您的初始矩形一分为二。
  2. 计算两个半矩形中的点数。如果它们相等(足够),则转至步骤 4。否则,转至步骤 3。
  3. 根据半矩形之间点的分布,移动线以再次均匀。因此,如果第一次切割将点分割为 1/3、2/3,则将线移至矩形较重的一半。转到步骤 2。(小心不要被困在这里,首先向一个方向移动直线,然后再向另一个方向移动递减的步长。)
  4. 现在,将每个半矩形传递到对此函数的递归调用,位于第 1 步。

我希望这足以很好地概述该提案。它有局限性:它会生成一些等于 2 的幂的矩形,因此如果这还不够好,请调整它。我已经用递归的方式表达了它,但它对于并行化来说是理想的。每次分割都会创建两个任务,每个任务分割一个矩形并再创建两个任务。

如果您不喜欢这种方法,也许您可​​以从常规网格开始,其中包含您想要的矩形数量的倍数(也许是 10 - 100 个)。计算每个小矩形中的点数。然后开始将小矩形粘合在一起,直到较小的矩形包含(大约)正确数量的点。或者,如果它足够好地满足您的要求,您可以使用它作为离散化方法并将其与我的第一种方法集成,但仅将切割线沿着小矩形的边界放置。这可能会快得多,因为您只需对每个小矩形中的点进行一次计数。

我还没有真正考虑过其中任何一个的运行时间;我更喜欢前一种方法,因为我进行了大量的并行编程并且拥有大量的处理器。

I think I'd start with the following, which is close to what @belisarius already proposed. If you have any additional requirements, such as preferring 'nearly square' rectangles to 'long and thin' ones you'll need to modify this naive approach. I'll assume, for the sake of simplicity, that the points are approximately randomly distributed.

  1. Split your initial rectangle in 2 with a line parallel to the short side of the rectangle and running exactly through the mid-point.
  2. Count the number of points in both half-rectangles. If they are equal (enough) then go to step 4. Otherwise, go to step 3.
  3. Based on the distribution of points between the half-rectangles, move the line to even things up again. So if, perchance, the first cut split the points 1/3, 2/3, move the line half-way into the heavy half of the rectangle. Go to step 2. (Be careful not to get trapped here, moving the line in ever decreasing steps first in one direction, then the other.)
  4. Now, pass each of the half-rectangles in to a recursive call to this function, at step 1.

I hope that outlines the proposal well enough. It has limitations: it will produce a number of rectangles equal to some power of 2, so adjust it if that's not good enough. I've phrased it recursively, but it's ideal for parallelisation. Each split creates two tasks, each of which splits a rectangle and creates two more tasks.

If you don't like that approach, perhaps you could start with a regular grid with some multiple (10 - 100 perhaps) of the number of rectangles you want. Count the number of points in each of these tiny rectangles. Then start gluing the tiny rectangles together until the less-tiny rectangle contains (approximately) the right number of points. Or, if it satisfies your requirements well enough, you could use this as a discretisation method and integrate it with my first approach, but only place the cutting lines along the boundaries of the tiny rectangles. This would probably be much quicker as you'd only have to count the points in each tiny rectangle once.

I haven't really thought about the running time of either of these; I have a preference for the former approach 'cos I do a fair amount of parallel programming and have oodles of processors.

缱倦旧时光 2024-09-11 03:20:24

我认为您正在寻找标准的 Kd 树或二元空间划分树。 (您可以在维基百科上查找。)

由于您有很多点,您可能希望仅大致划分前几个级别。在这种情况下,您应该从 200M 点(也许是 200k 个)中随机抽取样本,并在子样本的中点(沿较长的轴)分割整个数据集。如果您实际上随机选择点,那么您错过大量需要细分的点的概率将大约为零。

现在你有两个问题,每个问题大约有 100M 点。沿长轴将每个分开。重复此操作,直到停止提取子样本并沿整个数据集进行分割。经过十次广度优先迭代后,您就完成了。

如果你有一个不同的问题 - 你必须沿着 X 和 Y 轴提供刻度线并尽可能地沿着这些刻度填充网格,而不是对 Kd 树进行不规则分解 - 获取点的子样本并沿每个轴查找 0/32、1/32、...、32/32 百分位数。在那里绘制网格线,然后用您的点填充生成的 1024 个元素的网格。

You're after a standard Kd-tree or binary space partitioning tree, I think. (You can look it up on Wikipedia.)

Since you have very many points, you may wish to only approximately partition the first few levels. In this case, you should take a random sample of your 200M points--maybe 200k of them--and split the full data set at the midpoint of the subsample (along whichever axis is longer). If you actually choose the points at random, the probability that you'll miss a huge cluster of points that need to be subdivided will be approximately zero.

Now you have two problems of about 100M points each. Divide each along the longer axis. Repeat until you stop taking subsamples and split along the whole data set. After ten breadth-first iterations you'll be done.

If you have a different problem--you must provide tick marks along the X and Y axis and fill in a grid along those as best you can, rather than having the irregular decomposition of a Kd-tree--take your subsample of points and find the 0/32, 1/32, ..., 32/32 percentiles along each axis. Draw your grid lines there, then fill the resulting 1024-element grid with your points.

尝蛊 2024-09-11 03:20:24

好问题。

我认为你需要研究的领域是“计算几何”和“k-划分”问题。 此处有一个链接可能会帮助您开始,

您可能会发现问题本身是 NP -hard 这意味着一个好的近似算法是你能得到的最好的。

Good question.

I think the area you need to investigate is "computational geometry" and the "k-partitioning" problem. There's a link that might help get you started here

You might find that the problem itself is NP-hard which means a good approximation algorithm is the best you're going to get.

孤蝉 2024-09-11 03:20:24

K-means 聚类Voronoi 图 非常适合您要解决的问题?

Would K-means clustering or a Voronoi diagram be a good fit for the problem you are trying to solve?

浅笑依然 2024-09-11 03:20:24

这看起来像集群分析

That's looks like Cluster analysis.

旧情别恋 2024-09-11 03:20:24

QuadTree 可以工作吗?

四叉树是一种树形数据结构,其中每个内部节点恰好有四个子节点。四叉树最常用于通过递归地将二维空间细分为四个象限或区域来划分二维空间。这些区域可以是正方形或矩形,或者可以具有任意形状。 Raphael Finkel 和 JL Bentley 于 1974 年将这种数据结构命名为四叉树。类似的划分也称为 Q 树。所有形式的四叉树都有一些共同的特征:

  • 它们将空间分解为可适应的单元
  • 每个单元(或桶)都有最大容量。当达到最大容量时,桶分裂
  • 树目录遵循四叉树的空间分解

Would a QuadTree work?

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are most often used to partition a two dimensional space by recursively subdividing it into four quadrants or regions. The regions may be square or rectangular, or may have arbitrary shapes. This data structure was named a quadtree by Raphael Finkel and J.L. Bentley in 1974. A similar partitioning is also known as a Q-tree. All forms of Quadtrees share some common features:

  • They decompose space into adaptable cells
  • Each cell (or bucket) has a maximum capacity. When maximum capacity is reached, the bucket splits
  • The tree directory follows the spatial decomposition of the Quadtree
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文