是否有现有的数据库(首选嵌入式数据库)支持大维多维搜索?
我想在可以支持多维搜索的数据库之上构建一个C++应用程序(例如KDTree或RTree)。启用 R-tree 的 SQLite 仅支持最多 5 个维度,这比我需要的要小得多。有什么建议吗?
I would like to build a C++ application on top of an database that can support multi-dimensional search (e.g. KDTree or RTree). SQLite with R-tree enabled only supports up to 5 dimensions, which is much smaller than I need. Any suggestion?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
那么,这样的查询实际上有意义吗?欧氏距离是 2D 和 3D 几何数据的合理距离。但即使对于“时空”来说,它实际上也没有意义。因为 1 秒与 1 米完全不同。
首先决定哪种类型的距离是合理的,然后再考虑哪种指标合适。根据您的查询,您将需要非常不同的索引。
这里不存在“一刀切”的情况。对于一项任务和一个数据集表现良好的索引可能会比在另一项任务和一个数据集上表现良好的线性扫描(尤其是在高维情况下)差。
动态数据和静态数据又是两种截然不同的东西。动态维护一棵平衡良好的树比使用 STR 批量加载 R 树并仅使用窗口查询来查询它要困难得多。这只是几行,一个好的程序员应该能够在几天内完成。
您可能想了解高维数据的问题,例如 this关于“维数诅咒”的相当平衡的文章(有很多文章说“你无法索引高维数据”作为不这样做的借口,这篇至少给了你一些关于何时可以和何时不能的示例)。
Well, do the queries actually make sense this way? Euclidean distance is a reasonable distance for 2D and 3D geometric data. But even for "space-time" it does actually not make sense. Because 1 second is something entirely different than 1 meter.
First decide which type of distance is reasonable, then think about which index is appropriate. Depending on your queries, you will want very different indexes.
There is no "one size fits all" here. An index that performs well for one task and one data set will likely be worse than a linear scan - in particular at high dimensionality - on another.
Dynamic data and static data are again two things that are heavily different. Maintaining a well-balanced tree dynamically is a lot harder than bulk-loading an R-Tree with STR and just querying it with window queries. That's just a couple of lines, a good coder should be able to do in a few days.
You might want to read up on the problems of high-dimensional data, e.g. this rather balanced article on the "curse of dimensionality" (there are plenty of articles saying "you can't index high-dimensional data" as an excuse for failing to do so, this one at least gives you some examples on when you can and when you cannot).