查找包含单元格的一组区域的算法
我正在处理一些电子表格数据,并且有一组具有任意边界的单元格区域。给定任何细胞,确定包含该细胞的区域子集的最快方法是什么?
目前,我所拥有的最好方法是对区域进行排序,其中主要排序字段是区域的起始行索引,然后是其结束行索引、起始列索引,然后是结束列索引。当我想基于给定单元格进行搜索时,我对起始行索引位于单元格行索引之后的第一个区域进行二分搜索,然后检查该区域之前的所有区域以查看它们是否包含该单元格,但这太慢了。
I am working with some spreadsheet data and I have a set of cell regions that are of arbitrary bounds. Given any cell, what is the fastest way to determine the subset of regions which contain the cell?
Currently, the best I have is to sort the regions with the primary sort field being the region's starting row index, followed by its ending row index, starting column index, and then ending column index. When I want to search based on a given cell, I binary search to the first region whose starting row index is after the cell's row index and then I check all regions before that one to see if they contain the cell, but this is too slow.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
根据一些谷歌搜索,这是二维点包围搜索问题或“刺伤问题”的示例。请参阅:
http://www.cs.nthu。 edu.tw/~wkhon/ds/ds10/tutorial/tutorial6.pdf
(从第 21/52 页开始):
http://www.cs.brown.edu/courses/cs252/misc/slides/orthsearch.pdf
涉及的关键数据结构是线段树:
http://en.wikipedia.org/wiki/Segment_tree
对于 2- D 情况,看起来你可以构建一个包含线段树的线段树并获得 O(log^2(n)) 查询复杂度。 (我认为你当前的解决方案是 O(n) 因为平均来说你只会通过二分搜索排除一半的区域。)
但是,你说的是“电子表格”,这意味着你可能有一个相对较小的工作区域和。更重要的是,你有整数坐标。您说“最快”,这意味着您可能愿意牺牲空间和设置时间来换取更快的查询。
你没有说哪个电子表格,但下面的代码是一个效率极低、但非常简单、强力的 Excel/VBA 二维查找表实现,一旦设置,查询复杂度为 O(1) :
如果您需要担心较大的网格或相对于网格较大的区域,则可以通过使用两个一维查找表来节省大量空间和设置时间。但是,您需要进行两次查找,并且需要获取两个结果集的交集。
Based on some Googling, this is an example of the two dimensional point enclosure searching problem, or the "stabbing problem". See:
http://www.cs.nthu.edu.tw/~wkhon/ds/ds10/tutorial/tutorial6.pdf
of here (starting at p.21/52):
http://www.cs.brown.edu/courses/cs252/misc/slides/orthsearch.pdf
The key data structure involved is the segment tree:
http://en.wikipedia.org/wiki/Segment_tree
For the 2-D case, it looks like you can build a segment tree containing segment trees and get O(log^2(n)) query complexity. (I think your current solution is O(n) since on average you'll just exclude half of your regions with your binary search.)
However, you said "spreadsheet", which means you've probably got a relatively small area to work with. More importantly, you've got integer coordinates. And you said "fastest", which means you're probably willing to trade space and setup time for a faster query.
You didn't say which spreadsheet, but the code below is a wildly-inefficient, but dirt-simple, brute-force Excel/VBA implementation of a 2-D lookup table that, once set up, has O(1) query complexity:
If you have a larger grid to worry about or regions that are large relative to the grid, you can save a ton of space and setup time by using two 1-D lookup tables instead. However, then you have two lookups, plus a need to take the intersection of the two resulting sets.
我想你想确定单元格和区域的相交是否为Nothing
I think you want to determine if the Intersect of the cell and the region is Nothing