在图表中索引和查询路径的最有效方法
我有一个代表图形的表格:边缘(从,到)。
我想使用“路径查询”查询此表,仅检索路径的源和目的地。
例如,假设我的表包含以下行:
+------+----+
| from | to |
+------+----+
| a | b |
| b | c |
| c | d |
| f | g |
| b | f |
| c | a |
+------+----+
假设我执行以下(伪)查询:
SELECT ?v1, ?v2 WHERE ?v1 to ?t1, ?t1 to ?t2, ?t2 to ?v2;
这意味着我想要包含 4 个节点的所有路径中存在的所有源和目标对。执行此查询应返回以下结果:
+-----+-----+
| ?v1 | ?v2 |
+-----+-----+
| a | a |
| a | g |
| a | d |
+-----+-----+
当然,也可能需要由不同数量的节点组成的路径,数字 4 不是硬编码的:-)
我的问题是:
- 构建此类 SQL 查询的最佳方法是什么(请注意,我使用的是 SQLite,因此无法使用递归查询)。
- 我目前有一个用于 from 列的索引和一个用于 to 列的索引。这是最优的吗?我是否应该为“from, to”对创建一个索引?相反?
假设
没有自边(例如“a - a”)。
没有两个相同的行。
。
I have a table representing a graph: Edges(from, to).
I'd like to query this table with "path queries", retrieving only the source and destination of the path.
For example, assume my table consists of the following rows:
+------+----+
| from | to |
+------+----+
| a | b |
| b | c |
| c | d |
| f | g |
| b | f |
| c | a |
+------+----+
Assume I execute the following (pseudo-)query:
SELECT ?v1, ?v2 WHERE ?v1 to ?t1, ?t1 to ?t2, ?t2 to ?v2;
This means I want all the pairs of source and destination that exist in all paths consisting of 4 nodes. Executing this query should return the following results:
+-----+-----+
| ?v1 | ?v2 |
+-----+-----+
| a | a |
| a | g |
| a | d |
+-----+-----+
Of course, paths consisting of a different number of nodes might be needed as well, the number 4 isn't hardcoded :-)
My questions are:
- What's the best way to build such an SQL query (note that I'm using SQLite, so recursive queries can't be used).
- I currently have one index for the from column and one for the to column. Is this optimal? Should I create an index for the "from, to pair as well? Instead?
Assumptions
There are no self-edges (E.G "a - a").
There are no two identical rows.
Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
广告 1。)
除非您事先知道您的路径将始终具有给定长度(或一小组给定长度),否则您无法用纯 sql 表达您的查询。但是,您可以选择逐步维护图的传递闭包,特别是如果
dong 等人的论文中概述了该技术,doi: //10.1.1.103.3314;不要被理论和数学吓倒,他们还提供现成的 SQL 代码,而且他们的基本思想很简单。
ad 2.)
如果维护传递闭包表是您的一个选择,那么它可以借给表示路径起始和结束顶点的一对列上的一个索引。
如果不是,您可能可以利用图表的结构:对于与扇入相比高(小)的平均扇出,您应该更好地在“to”(“from”)上建立索引) 柱子。
如果您无法对扇出/扇入比率做出假设,那么您最好在每列上建立一个索引。
希望它有帮助,
最诚挚的问候,卡斯滕
ad 1.)
unless you know in advance that your paths will always be of a given length (or a small set of given lengths), you cannot express your query in pure sql. however, you may choose to incrementally maintain the transitive closure of your graph, in particular if
the technique is outlined in a paper by dong et al., doi://10.1.1.103.3314; don't be daunted by the theory and the math, they also provide sql code ready-to-use and their basic ideas are straightforward.
ad 2.)
if maintaining a transitive closure table is an option for you it wpould lend to one index on the pair of columns representing start and end vertices of paths.
if it isn't you might be able exploit the structure of your graph: for average fan-outs that are high (small) in comparison to fan-ins you should be better of with an index on the 'to' ('from') column.
if you cannot make an assumption on fan-out/fan-in ratios you're probably best off with an index on each column.
hope it helps,
best regards, carsten