在一组无序点上放置网格的算法
给定一大组(数万到数百万)表示为 3D 笛卡尔向量的无序点,什么是制作包含所有点的规则方形网格(用户定义的间距)的好算法?一些限制:
- 网格需要是方形和规则的
- 我需要能够调整网格间距(其中一个正方形的一边的长度),理想情况下使用单个变量
- 我想要一个最小尺寸的网格,即每个'网格中的“块”应该至少包含一个无序点,并且每个无序点都应该包含在一个“块”中
- 算法的返回值应该是网格点的坐标列表
为了以二维方式说明,给定这个点集:
对于某些网格间距 X,算法的一个可能的返回值将是这些红点的坐标(虚线仅用于说明目的):
对于网格间距 X/2,一个可能的返回值算法的坐标将是这些红点的坐标(虚线仅用于说明目的):
对于任何感兴趣的人,我正在处理的无序点是原子坐标大蛋白质分子,就像您可以从 .pdb 文件中获得的那样。
尽管伪代码也很好,但 Python 是解决方案的首选。
编辑:我认为我对我需要的东西的第一个描述可能有点模糊,所以我添加了一些约束和图像来澄清事情。
Given a large set (tens of thousands up to millions) of disordered points represented as 3D Cartesian vectors, what's a good algorithm for making a regular square grid (of user-defined spacing) that encloses all of the points? Some constraints:
- The grid needs to be square and regular
- I need to be able to adjust the grid spacing (the length of a side of one of the squares), ideally with a single variable
- I want a grid of minimum size, ie every 'block' in the grid should contain at least one of the disordered points, and every disordered point should be enclosed in a 'block'
- The return value of the algorithm should be the list of coordinates of the grid points
To illustrate in 2D, given this set of points:
for some grid spacing X, one possible return value of the algorithm would be the coordinates of these red points (dashed lines for illustration purposes only):
and for grid spacing X/2, one possible return value of the algorithm would be the coordinates of these red points (dashed lines for illustration purposes only):
For anyone who's interested, the disordered points that I'm working with are the atomic coordinates of large protein molecules, like what you can get out of a .pdb file.
Python is preferred for solutions, although pseudocode is also good.
EDIT: I think that my first description of what I needed was maybe a little fuzzy, so I added some constraints and images in order to clarify things.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我建议你制作一个 kd 树。它快速、简单且易于实现:
维基百科代码:
您必须稍微修改它不过,要符合您的限制。
I'd suggest you make a k-d tree. It's fast-ish, simple, and easy to implement:
And Wikipedia code:
You'd have to slightly modify it, though, to fit within your constraints.
因为您要求用户指定间距的规则方形网格,所以听起来一个相当简单的方法应该可行。
首先通过数据计算出每个维度的最小和最大坐标。计算出覆盖最大值和最小值之间的距离所需的用户指定间距的步数。
再次传递数据,将每个点分配给网格中的一个单元格,使用每个坐标最小值处有一个点和指定间距的网格(例如 X_cell = Math.floor((x_i - x_min) / 间距) )。使用字典或数组记录每个单元格中的点数。
现在打印出其中至少有一个点的单元格的坐标。
你确实有一些我没有尝试优化的自由度:除非最小和最大坐标之间的距离是网格间距的精确倍数,否则会有一些倾斜,允许你滑动网格并仍然包含它所有点:目前网格从最低点的位置开始,但它可能在最高点之前结束,因此您有空间在每个维度上将其向下移动一点。当您执行此操作时,某些点将从单元格移动到单元格,并且占用的单元格数量也会发生变化。
如果您一次只考虑一个维度的移动,您可以相当有效地计算出将会发生什么。计算出该维度中每个点与其单元格该维度中的最大坐标之间的距离,然后对这些值进行排序。当您向下移动网格时,与其最大坐标距离最小的点将首先交换单元格,您可以通过按排序顺序移动这些点来逐个迭代它们。如果您在执行此操作时更新单元格中的点数,您可以计算出哪个班次使占用单元格的数量最小化。
当然,您需要担心三个方面。你可以一次处理一个,直到细胞数量减少。这是局部最小值,但可能不是全局最小值。寻找其他局部最小值的一种方法是从随机选择的起点重新开始。
Because you are asking for a regular square grid of user-specified spacing, it sounds like a reasonably straightforward approach should work.
Start by passing through the data to work out the minimum and maximum co-ordinate in each dimension. Work out the number of steps of user-specified spacing required to cover the distance between maximum and minimum.
Pass through the data again to allocate each point to a cell in the grid, using a grid with a point at the minimum of each co-ordinate and the specified spacing (e.g. X_cell = Math.floor((x_i - x_min) / spacing)). Use a dictionary or an array to record the number of points in each cell.
Now print out the co-ordinates of the cells with at least one point in them.
You do have some freedom that I have not attempted to optimise: unless the distance between minimum and maximum co-ordinate is an exact multiple of the grid spacing, there will be some slop that allows you to slide the grid around and still have it contain all the points: at the moment the grid starts at the position of the lowest point, but it probably ends before the highest points, so you have room to move it down a little in each dimension. As you do this, some points will move from cell to cell, and the number of occupied cells will change.
If you consider only moves in one dimension at a time, you can work out what will happen reasonably efficiently. Work out the distance in that dimension between each point and the maximum co-ordinate in that dimension of its cell, and then sort these values. As you move the grid down, the point with the smallest distance to its maximum co-ordinate will swap cells first, and you can iterate through these points one by one by moving through them in sorted order. If you update the counts of points in cells as you do this you can work out which shift minimises the number of occupied cells.
Of course, you have three dimensions to worry about. You could work on them one at a time until you getting reductions in the number of cells. This is a local minimum, but may not be a global minimum. One way to look for other local minima is to start again from a randomly chosen starting point.
Voronoi 图怎么样?它可以使用 Fortunes 算法
O(n log n)
生成/a>.我不知道它是否解决了你的问题,但沃罗诺伊图非常“自然”。它们在自然界中很常见。
示例(来自维基百科):
How about Voronoi Diagram? It can be generated in
O(n log n)
using Fortunes algorithm.I don't know if it addresses your problem, but Voronoi Diagrams are very "narural". They are very common in the nature.
Example (from Wikipedia):
找到一个包围所有点的最小面积正方形。反复将每个方格细分为 4 个子方格(从 1 到 4 到 16 到 64 到……)。在其中一个方块变空之前停止。不难证明,生成的网格最多是最佳解决方案的四倍粗(关键见解:保证空方格中至少包含任何网格中的至少一个方格,其细度至少是最佳解决方案的两倍)。
也许可以通过引入随机平移来减少该常数。
Find a minimum-area square that encloses all of the points. Repeatedly subdivide each square into 4 sub-squares (so going from 1 to 4 to 16 to 64 to …). Stop just before one of the squares becomes empty. It's not hard to prove that the resulting grid is at most four times as coarse as the optimal solution (key insight: an empty square is guaranteed to contain at least one square from any grid at least twice as fine).
Probably that constant can be reduced by introducing a random translation.
我有 2D 网格聚类的经验,并在 C# 代码中实现了一个示例。
http://kunuk.wordpress.com/2011/09/15/ clustering-grid-cluster/
这可以处理步骤 1、2 和 4。
您必须修改代码并将其更新为 3D 空间。希望这能给你一些想法。
该代码的运行时间为 O(m*n),其中 m 是网格数,n 是点数。
I have experience with grid clustering in 2D and implemented an example in C# code.
http://kunuk.wordpress.com/2011/09/15/clustering-grid-cluster/
This can handle step handle step 1, 2 and 4.
You will have to modify the code and update it to 3D-space. Hope this gives you some ideas.
The code runs in O(m*n) where m is number of grids and n is number of points.
如果您希望网格单元是方形且规则的,您很可能需要一个 八叉树。如果你可以放松平方和正则约束,你可以制作一个kd-tree。
If you want the grid cells to be square and regular, you most likely want an Octree. If you can relax the square and regular constraint, you can make a k-d-tree.