如何使用KDTree进行任意维度的top-k查询和范围查询

发布于 2024-09-10 06:30:26 字数 255 浏览 8 评论 0原文

我使用了KD-tree(libkdtree++)来存储一个多维数据集,这里的要求是这个数据集可以支持不同维度上的top-k/range查询。例如,KDTree<3,Point>。树:查找具有最高 Point[1](y 轴)值的前 100 个点。

从libkdtree++的实现来看,类似的是“find_within_range”函数,但它是根据“曼哈顿距离”来计算的,这里等于max(x_dist, max(y_dist, z_dist))。如何在一维上使用范围查询?

I have used KD-tree(libkdtree++) to store a multi-dimensional data set, and the requirements here is this data set can support top-k/range queries on different dimensions. For example, a KDTree<3, Point> tree: to find the top 100 points whose have highest Point[1](y axis) values.

From the implementation of libkdtree++, what's similar is the "find_within_range" functions, however it is counted based on "Manhattan distance", which equals max(x_dist, max(y_dist, z_dist)) here. How can I just use range query on one dimension?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

晨敛清荷 2024-09-17 06:30:26

看看代码,看起来你不能以简单的方式做到这一点,这很可笑。如果我是你,我会很想破解这个库或者编写我自己的 kd 树。我会在他们的邮件列表上询问以确定,但看起来你可能必须做这样的事情:

kdtreetype::_Region_ r(point_with_min_y);
r.set_low_bound(min_x, 0);
r.set_high_bound(max_x, 0);
r.set_low_bound(min_z, 2);
r.set_high_bound(max_z, 2);
r.set_high_bound((min_y + max_y) / 2, 1);

double search_min = min_y, search_max = max_y;

// binary search to get 100 points
int c;
while (c = tree.count_within_range(r) != 100) {
    if (c > 100) search_max = (search_min + search_max) / 2;
    else         search_min = (search_min + search_max) / 2;
    r.set_high_bound((search_min + search_max) / 2);
}

tree.visit_within_range(r, process_min_y_point);

这是对 Y 的极其低效的二分搜索,其中 count(y <= Y) == 100我对图书馆不熟悉,但这是我粗略检查过的最好的图书馆。

Looking at the code, it looks like you can't do that in a straightforward way, ridiculously enough. If I were you I'd be tempted to either hack the library or write my own kd-tree. I'd ask on their mailing list to be sure, but it looks like you might have to do something like this:

kdtreetype::_Region_ r(point_with_min_y);
r.set_low_bound(min_x, 0);
r.set_high_bound(max_x, 0);
r.set_low_bound(min_z, 2);
r.set_high_bound(max_z, 2);
r.set_high_bound((min_y + max_y) / 2, 1);

double search_min = min_y, search_max = max_y;

// binary search to get 100 points
int c;
while (c = tree.count_within_range(r) != 100) {
    if (c > 100) search_max = (search_min + search_max) / 2;
    else         search_min = (search_min + search_max) / 2;
    r.set_high_bound((search_min + search_max) / 2);
}

tree.visit_within_range(r, process_min_y_point);

This is a horribly inefficient binary search for the Y at which count(points with y <= Y) == 100. I'm not familiar with the library, but that's the best I've got on a cursory inspection.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文