SQLite - 是否有 rtree 的替代方案来在轴上索引线?
我正在尝试创建一个具有起点和终点的位置数据库:基本上是一维轴上的线。我想有效地查询与给定间隔重叠的所有位置。在传统表中,查询需要两个不等式,因此无法建立索引。您还可以使用 R-Tree 索引,但它们似乎是为多维范围查询而设计的。有没有更有效的方法在轴上存储线?
如果有人好奇,数据库将存储基因组间隔。这是一个示例表:
CREATE TABLE lines (id INTEGER PRIMARY KEY, start INTEGER, stop INTEGER);
执行此操作的基本方法是:
SELECT * FROM lines WHERE start <= <end of interval> AND stop >= <start of interval>;
同样,这非常慢并且无法建立索引。 R 树的工作方式如下:
CREATE VIRTUAL TABLE lines_index USING RTREE (id, start, stop);
SELECT * from lines_index WHERE start <= <end of interval> AND stop >= <start of interval>;
R 树对于我们的实现来说并不理想,所以我想知道是否有任何替代方案......
I'm trying to make a database of positions with a start and stop: basically lines on a 1D axis. I want to efficiently query all positions that overlap a given interval. In a traditional table, the query would require two inequalities, so it cannot be indexed. You can also use an R-Tree index, but they seem designed for multidimensional range queries. Is there a more efficient way to store lines on an axis?
If anybody curious, the database is to store genome intervals. Here's an example table:
CREATE TABLE lines (id INTEGER PRIMARY KEY, start INTEGER, stop INTEGER);
The basic way to do this is:
SELECT * FROM lines WHERE start <= <end of interval> AND stop >= <start of interval>;
Again, that's really slow and can't be indexed. The R-Tree would work like this:
CREATE VIRTUAL TABLE lines_index USING RTREE (id, start, stop);
SELECT * from lines_index WHERE start <= <end of interval> AND stop >= <start of interval>;
R-Trees aren't ideal for our implementation, so I'm wondering if there are any alternatives...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,尽管您无法完全对其进行索引,但您可以仅按开始间隔进行索引。如果 90% 的间隔有 start=stop,那应该会有很大的改进。唯一的减速是间隔很长。
First of all, allthough you can't fully index it you could index by just the start interval. If 90% of the intervals have start=stop, that should make a big improvement. The only slowdown would be with very long intervals.