We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 9 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(2)
我曾经使用一种自制的四叉树进行空间索引(早在我学习单词“四叉树”)。对于普通类型的空间数据(我处理街道地图数据),它们创建速度快,查询速度快,但在查询期间扫描太多叶节点。具体来说,在合理的节点大小 (50-100) 下,我的四叉树往往会为点查询生成大约 300 个结果,即应用 3-6 个叶节点(非常粗略的估计;结果变化很大。)
现在,我首选的数据结构是R*树。我自己编写并测试了一个实现,取得了非常好的结果。与我的四叉树代码相比,我构建 R*tree 的代码非常慢,但叶节点上的边界框最终组织得很好;至少一半的查询空间仅由一个叶节点回答(即,如果您进行随机点查询,则很有可能仅返回单个叶节点),并且大约 90% 的空间被覆盖两个或更少的节点。因此,对于 80 个元素的节点大小,我通常会从点查询中获得 80 或 160 个结果,平均值接近 160(因为一些查询确实返回 3-5 个节点)。即使在地图上密集的城市地区也是如此。
我知道这一点是因为我为 R* 树及其内部的图形对象编写了一个可视化工具,并在大型数据集(600,000 个路段)上对其进行了测试。它在点数据(以及边界框很少重叠的其他数据)上表现得更好。如果您实现 R* 树,我强烈建议您将结果可视化,因为当我编写我的树时,它有多个错误降低了树的效率(不影响正确性),并且我能够调整一些决策以获得更好的结果。请务必在大型数据集上进行测试,因为它将揭示小数据集无法揭示的问题。它可能有助于减少用于测试的树的扇出(节点大小),以查看树在几层深度时的工作情况。
我很乐意向您提供源代码,但需要获得雇主的许可。你知道是怎么回事。在我的实现中,我支持强制重新插入,但是我的 PickSplit 和
插入惩罚已被调整。
原始论文,R* 树:一种高效且稳健的点访问方法和矩形,由于某种原因缺少点(“i”上没有句点和点)。另外,他们的术语有点奇怪,例如,当他们说“边距”时,他们的意思是“周长”。
如果您需要可修改的数据结构,R* 树是一个不错的选择。如果您在首次创建树后不需要对其进行修改,请考虑批量加载算法。如果批量加载后只需要对树进行少量修改,普通的 R 树算法就足够了。请注意,R*-tree 和 R-tree 数据在结构上是相同的;只有插入(也许还有删除?我忘了)的算法不同。 R树是1984年的原始数据结构;这是R-的链接树纸。
kd-tree 看上去效率很高,实现起来也不是太难,但是只能用对于点数据。
顺便说一句,我如此关注叶节点的原因是
I used to use a kind of home-grown QuadTree for spatial indexing (well before I learned the word "quadtree"). For ordinary kinds of spatial data (I deal with street map data), they are fast to create and fast to query, but they scan too many leaf nodes during queries. Specifically, with reasonable node sizes (50-100), my quadtree tended to produce around 300 results for a point query, i.e. 3-6 leaf nodes apply (very rough ballpark; results are highly variable.)
Nowadays, my preferred data structure is the the R*tree. I wrote and tested an implementation myself that obtained very good results. My code for building an R*tree is very slow compared to my QuadTree code, but the bounding boxes on the leaf nodes end up very well organized; at least half of the query space is answered by only one leaf node (i.e. if you do a random point query, there is a good chance that only a single leaf node is returned), and something like 90% of the space is covered by two nodes or less. So with a node size of 80 elements, I'd typically get 80 or 160 results from a point query, with the average closer to 160 (since a few queries do return 3-5 nodes). This holds true even in dense urban areas of the map.
I know this because I wrote a visualizer for my R* tree and the graphical objects inside it, and I tested it on a large dataset (600,000 road segments). It performs even better on point data (and other data in which bounding boxes rarely overlap). If you implement an R* tree I urge you to visualize the results, because when I wrote mine it had multiple bugs that lowered the efficiency of the tree (without affecting correctness), and I was able to tweak some of the decision-making to get better results. Be sure to test on a large dataset, as it will reveal problems that a small dataset does not. It may help to decrease the fan-out (node size) of the tree for testing, to see how well the tree works when it is several levels deep.
I'd be happy to give you the source code except that I would need my employer's permission. You know how it is. In my implementation I support forced reinsertion, but my PickSplit and
insertion penalty have been tweaked.
The original paper, The R* tree: An Efficient and Robust Access Method for Points and Rectangles, is missing dots for some reason (no periods and no dots on the "i"s). Also, their terminology is a bit weird, e.g. when they say "margin", what they mean is "perimeter".
The R* tree is a good choice if you need a data structure that can be modified. If you don't need to modify the tree after you first create it, consider bulk loading algorithms. If you only need to modify the tree a small amount after bulk loading, ordinary R-tree algorithms will be good enough. Note that R*-tree and R-tree data is structurally identical; only the algorithms for insertion (and maybe deletion? I forget) are different. R-tree is the original data structure from 1984; here's a link to the R-tree paper.
The kd-tree looks efficient and not too difficult to implement, but it can only be used for point data.
By the way, the reason I focus on leaf nodes so much is that
我开发了一种基于象限的快速搜索算法,并于几年前将其发布在 ddj.com 上。也许您感兴趣:
加速搜索最近的线路
http://drdobbs.com/windows/198900559
I developed a algorithm for quadrant based fast search and publushed it on ddj.com a couple of years ago. Maybe it's interesting for you:
Accelerated Search For the Nearest Line
http://drdobbs.com/windows/198900559