查找包含单元格的一组区域的算法

发布于 2024-09-29 15:51:24 字数 220 浏览 5 评论 0原文

我正在处理一些电子表格数据,并且有一组具有任意边界的单元格区域。给定任何细胞,确定包含该细胞的区域子集的最快方法是什么?

目前,我所拥有的最好方法是对区域进行排序,其中主要排序字段是区域的起始行索引,然后是其结束行索引、起始列索引,然后是结束列索引。当我想基于给定单元格进行搜索时,我对起始行索引位于单元格行索引之后的第一个区域进行二分搜索,然后检查该区域之前的所有区域以查看它们是否包含该单元格,但这太慢了。

I am working with some spreadsheet data and I have a set of cell regions that are of arbitrary bounds. Given any cell, what is the fastest way to determine the subset of regions which contain the cell?

Currently, the best I have is to sort the regions with the primary sort field being the region's starting row index, followed by its ending row index, starting column index, and then ending column index. When I want to search based on a given cell, I binary search to the first region whose starting row index is after the cell's row index and then I check all regions before that one to see if they contain the cell, but this is too slow.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

愿得七秒忆 2024-10-06 15:51:24

根据一些谷歌搜索,这是二维点包围搜索问题或“刺伤问题”的示例。请参阅:

http://www.cs.nthu。 edu.tw/~wkhon/ds/ds10/tutorial/tutorial6.pdf

(从第 21/52 页开始):

http://www.cs.brown.edu/courses/cs252/misc/slides/orthsearch.pdf

涉及的关键数据结构是线段树:

http://en.wikipedia.org/wiki/Segment_tree

对于 2- D 情况,看起来你可以构建一个包含线段树的线段树并获得 O(log^2(n)) 查询复杂度。 (我认为你当前的解决方案是 O(n) 因为平均来说你只会通过二分搜索排除一半的区域。)

但是,你说的是“电子表格”,这意味着你可能有一个相对较小的工作区域和。更重要的是,你有整数坐标。您说“最快”,这意味着您可能愿意牺牲空间和设置时间来换取更快的查询。

你没有说哪个电子表格,但下面的代码是一个效率极低、但非常简单、强力的 Excel/VBA 二维查找表实现,一旦设置,查询复杂度为 O(1) :

Public Sub brutishButShort()
    Dim posns(1 To 65536, 1 To 256) As Collection

    Dim regions As Collection
    Set regions = New Collection

    Call regions.Add([q42:z99])
    Call regions.Add([a1:s100])
    Call regions.Add([r45])

    Dim rng As Range
    Dim cell As Range
    Dim r As Long
    Dim c As Long

    For Each rng In regions
        For Each cell In rng
            r = cell.Row
            c = cell.Column

            If posns(r, c) Is Nothing Then
                Set posns(r, c) = New Collection
            End If

            Call posns(r, c).Add(rng)
        Next cell
    Next rng

    Dim query As Range
    Set query = [r45]

    If Not posns(query.Row, query.Column) Is Nothing Then
        Dim result As Range
        For Each result In posns(query.Row, query.Column)
            Debug.Print result.address
        Next result
    End If
End Sub

如果您需要担心较大的网格或相对于网格较大的区域,则可以通过使用两个一维查找表来节省大量空间和设置时间。但是,您需要进行两次查找,并且需要获取两个结果集的交集。

Based on some Googling, this is an example of the two dimensional point enclosure searching problem, or the "stabbing problem". See:

http://www.cs.nthu.edu.tw/~wkhon/ds/ds10/tutorial/tutorial6.pdf

of here (starting at p.21/52):

http://www.cs.brown.edu/courses/cs252/misc/slides/orthsearch.pdf

The key data structure involved is the segment tree:

http://en.wikipedia.org/wiki/Segment_tree

For the 2-D case, it looks like you can build a segment tree containing segment trees and get O(log^2(n)) query complexity. (I think your current solution is O(n) since on average you'll just exclude half of your regions with your binary search.)

However, you said "spreadsheet", which means you've probably got a relatively small area to work with. More importantly, you've got integer coordinates. And you said "fastest", which means you're probably willing to trade space and setup time for a faster query.

You didn't say which spreadsheet, but the code below is a wildly-inefficient, but dirt-simple, brute-force Excel/VBA implementation of a 2-D lookup table that, once set up, has O(1) query complexity:

Public Sub brutishButShort()
    Dim posns(1 To 65536, 1 To 256) As Collection

    Dim regions As Collection
    Set regions = New Collection

    Call regions.Add([q42:z99])
    Call regions.Add([a1:s100])
    Call regions.Add([r45])

    Dim rng As Range
    Dim cell As Range
    Dim r As Long
    Dim c As Long

    For Each rng In regions
        For Each cell In rng
            r = cell.Row
            c = cell.Column

            If posns(r, c) Is Nothing Then
                Set posns(r, c) = New Collection
            End If

            Call posns(r, c).Add(rng)
        Next cell
    Next rng

    Dim query As Range
    Set query = [r45]

    If Not posns(query.Row, query.Column) Is Nothing Then
        Dim result As Range
        For Each result In posns(query.Row, query.Column)
            Debug.Print result.address
        Next result
    End If
End Sub

If you have a larger grid to worry about or regions that are large relative to the grid, you can save a ton of space and setup time by using two 1-D lookup tables instead. However, then you have two lookups, plus a need to take the intersection of the two resulting sets.

情泪▽动烟 2024-10-06 15:51:24

我想你想确定单元格和区域的相交是否为Nothing

Sub RegionsContainingCell(rCell As Range, ParamArray vRegions() As Variant)

    Dim i As Long

    For i = LBound(vRegions) To UBound(vRegions)
        If TypeName(vRegions(i)) = "Range" Then
            If Not Intersect(rCell, vRegions(i)) Is Nothing Then
                Debug.Print vRegions(i).Address
            End If
        End If
    Next i

End Sub

Sub test()

    RegionsContainingCell Range("B50"), Range("A1:Z100"), Range("C2:C10"), Range("B1:B70"), Range("A1:C30")

End Sub

I think you want to determine if the Intersect of the cell and the region is Nothing

Sub RegionsContainingCell(rCell As Range, ParamArray vRegions() As Variant)

    Dim i As Long

    For i = LBound(vRegions) To UBound(vRegions)
        If TypeName(vRegions(i)) = "Range" Then
            If Not Intersect(rCell, vRegions(i)) Is Nothing Then
                Debug.Print vRegions(i).Address
            End If
        End If
    Next i

End Sub

Sub test()

    RegionsContainingCell Range("B50"), Range("A1:Z100"), Range("C2:C10"), Range("B1:B70"), Range("A1:C30")

End Sub
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文