地理空间索引划分查询

发布于 2024-11-17 09:25:19 字数 481 浏览 10 评论 0原文

我正在研究使用类似 geohash 的索引(也许使用希尔伯特曲线)来存储地理空间信息。我的问题是关于如何最好地分割这样一个索引的区域查询。

这个例如,文章展示了如何将区域查询拆分为多个查询,以避免查询表现出不良局部性的范围(请参阅图像)。如果您想使用 Z 曲线(如普通的 geohash)通过单个查询来搜索圆形区域,则必须查询整个左下象限,该象限只占我们关注的区域的一小部分。

在这种情况下,最好将搜索分成几个查询,但是我无法找到有关如何最好地执行此操作的任何信息。是否有算法可以将这样的范围查询拆分为覆盖原始区域的较小范围?

I'm looking into storing geospatial information using a geohash-like index, perhaps using Hilbert curves. My question is regarding how best to split up area queries on such an index.

This article for example shows how one might want to split an area query into multiple queries to avoid quering a range exhibiting poor locality (see this image). If you wanted to search the circular area with a single query using a Z curve like a normal geohash you would have to query the entire lower left quadrant which has only a tiny fraction of the area we're concerned with.

In this case it would be better to split the search into a few queries, however I haven't been able to find any information on how best to do this. Are there algorithms for splitting a range query like this into smaller ranges which cover the original area?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

栩栩如生 2024-11-24 09:25:20

一旦确定了覆盖查询范围的哈希前缀,您就可以开始将该前缀拆分为组成前缀,并在保留之前测试每个前缀是否与您的查询范围相交。例如,假设您已将前缀 0100 识别为覆盖您的查询区域。前缀 0100 由前缀 01000 和 01001 组成,而前缀 01000 由前缀 010000 和 010001 组成,前缀 01001 由前缀 010010 和 010011 组成,等等。当您将前缀重写为较大前缀的集合时(对应于较小的区域),您可以过滤掉这些前缀不与您的查询范围相交。你必须在某个时刻停止分裂过程;每次拆分迭代都可能使前缀集合的大小加倍。例如,您可以创建最大前缀集合大小,此时您可以声明对过滤感到满意;当然,您还可以使用其他指标来找到停止点。最后一步,您可以重新组合“相邻”前缀,以减少正在执行的搜索数量。例如,如果您只剩下前缀 01000 和 01001,则可以将它们组合到 0100 中,以避免先搜索 01000,然后再搜索 01001(假设搜索过程的开销超出了顺序读取的范围,这是一个好处) 。您需要一个例程来计算哈希前缀的边界框,以便测试与查询边界的交集。这将取决于您使用的哈希方案。

Once you've identified a hash prefix that covers your query bounds, you can begin splitting that prefix into constituent prefixes and testing whether each intersects your query bounds before keeping it. For example, say you've identified the prefix 0100 as covering your query area. The prefix 0100 comprises the prefixes 01000 and 01001, while the prefix 01000 comprises the prefixes 010000 and 010001, and the prefix 01001 comprises the prefixes 010010 and 010011, etc. While you're rewriting your prefix as a collection of larger prefixes (corresponding to smaller areas), you can filter out those prefixes that do not intersect your query bounds. You'll have to stop the splitting process at some point; each iteration of splitting potentially doubles the size of your prefix collection. You might create, for instance, a maximum prefix collection size at which point you declare satisfaction with your filtering; of course, there are other metrics that you could use to find a stopping point. As a final step, you can re-combine "adjacent" prefixes in order to reduce the number of searches that you are performing. If, for instance, you're left with the prefixes 01000 and 01001, you can combine these into 0100 to avoid a search for 01000 followed by a search for 01001 (a benefit under the assumption that the search process has overhead beyond sequential reads). You'll need a routine for calculating the bounding box of a hash prefix in order to test for intersection with your query bounds. This will depend upon the hashing scheme that you use.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文