如何加快寻找传递闭包根节点的查询速度?
我有一个代表一棵树的历史传递闭包表。
create table TRANSITIVE_CLOSURE
(
CHILD_NODE_ID number not null enable,
ANCESTOR_NODE_ID number not null enable,
DISTANCE number not null enable,
FROM_DATE date not null enable,
TO_DATE date not null enable,
constraint TRANSITIVE_CLOSURE_PK unique (CHILD_NODE_ID, ANCESTOR_NODE_ID, DISTANCE, FROM_DATE, TO_DATE)
);
下面是一些示例数据:
CHILD_NODE_ID | ANCESTOR_NODE_ID | DISTANCE
--------------------------------------------
1 | 1 | 0
2 | 1 | 1
2 | 2 | 0
3 | 1 | 2
3 | 2 | 1
3 | 3 | 0
不幸的是,我当前查找根节点的查询会导致全表扫描:
select *
from transitive_closure tc
where
distance = 0
and not exists (
select null
from transitive_closure tci
where tc.child_node_id = tci.child_node_id
and tci.distance <> 0
);
从表面上看,它看起来并不太昂贵,但当我接近 100 万行时,这个特定的查询开始变得令人讨厌。 ..特别是当它是获取邻接树以获取遗留支持的视图的一部分时。
有没有更好的方法来找到传递闭包的根节点?我想重写所有旧的遗留代码,但我不能......所以我需要以某种方式构建邻接列表。获取除根节点之外的所有内容都很容易,那么有没有更好的方法呢?我是否以错误的方式思考这个问题?
对具有 800k 行的表的查询计划。
OPERATION OBJECT_NAME OPTIONS COST
SELECT STATEMENT 2301
HASH JOIN RIGHT ANTI 2301
Access Predicates
TC.CHILD_NODE_ID=TCI.CHILD_NODE_ID
TABLE ACCESS TRANSITIVE_CLOSURE FULL 961
Filter Predicates
TCI.DISTANCE = 1
TABLE ACCESS TRANSITIVE_CLOSURE FULL 962
Filter Predicates
DISTANCE=0
I have a historical transitive closure table that represents a tree.
create table TRANSITIVE_CLOSURE
(
CHILD_NODE_ID number not null enable,
ANCESTOR_NODE_ID number not null enable,
DISTANCE number not null enable,
FROM_DATE date not null enable,
TO_DATE date not null enable,
constraint TRANSITIVE_CLOSURE_PK unique (CHILD_NODE_ID, ANCESTOR_NODE_ID, DISTANCE, FROM_DATE, TO_DATE)
);
Here's some sample data:
CHILD_NODE_ID | ANCESTOR_NODE_ID | DISTANCE
--------------------------------------------
1 | 1 | 0
2 | 1 | 1
2 | 2 | 0
3 | 1 | 2
3 | 2 | 1
3 | 3 | 0
Unfortunately, my current query for finding the root node causes a full table scan:
select *
from transitive_closure tc
where
distance = 0
and not exists (
select null
from transitive_closure tci
where tc.child_node_id = tci.child_node_id
and tci.distance <> 0
);
On the surface, it doesn't look too expensive, but as I approach 1 million rows, this particular query is starting to get nasty... especially when it's part of a view that grabs the adjacency tree for legacy support.
Is there a better way to find the root node of a transitive closure? I would like to rewrite all of our old legacy code, but I can't... so I need to build the adjacency list somehow. Getting everything except the root node is easy, so is there a better way? Am I thinking about this problem the wrong way?
Query plan on a table with 800k rows.
OPERATION OBJECT_NAME OPTIONS COST
SELECT STATEMENT 2301
HASH JOIN RIGHT ANTI 2301
Access Predicates
TC.CHILD_NODE_ID=TCI.CHILD_NODE_ID
TABLE ACCESS TRANSITIVE_CLOSURE FULL 961
Filter Predicates
TCI.DISTANCE = 1
TABLE ACCESS TRANSITIVE_CLOSURE FULL 962
Filter Predicates
DISTANCE=0
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
查询执行需要多长时间,您希望执行多长时间? (您通常不想使用成本进行调优。很少有人知道解释计划成本的真正含义。)
在我的慢速桌面上,查询 800K 行只花了 1.5 秒。然后数据存入内存后 0.5 秒。你的情况是否变得更糟,
或者这个查询会非常频繁地运行吗?
我不知道您的数据是什么样的,但我猜想全表扫描始终最适合此查询。假设您的分层数据
比较浅,即0和1的距离很多,但100的距离很少,最重要的列不会很明显。这意味着
任何距离的索引条目都将指向大量的块。使用多块读取一次读取整个表会便宜得多
而不是一次一个块地读取大量内容。
另外,你说的历史是什么意思?您可以将此查询的结果存储在物化视图中吗?
另一个可能的想法是使用解析函数。这用排序代替了第二次表扫描。这种方法通常更快,但对我来说
查询实际上需要更长的时间,5.5 秒而不是 1.5 秒。但也许它在您的环境中会做得更好。
How long does the query take to execute, and how long do you want it to take? (You usually do not want to use the cost for tuning. Very few people know what the explain plan cost really means.)
On my slow desktop the query only took 1.5 seconds for 800K rows. And then 0.5 seconds after the data was in memory. Are you getting something significantly worse,
or will this query be run very frequently?
I don't know what your data looks like, but I'd guess that a full table scan will always be best for this query. Assuming that your hierarchical data
is relatively shallow, i.e. there are many distances of 0 and 1 but very few distances of 100, the most important column will not be very distinct. This means
that any of the index entries for distance will point to a large number of blocks. It will be much cheaper to read the whole table at once using multi-block reads
than to read a large amount of it one block at a time.
Also, what do you mean by historical? Can you store the results of this query in a materialized view?
Another possible idea is to use analytic functions. This replaces the second table scan with a sort. This approach is usually faster, but for me this
query actually takes longer, 5.5 seconds instead of 1.5. But maybe it will do better in your environment.
您可以尝试添加距离和 child_node_id 的索引,或者更改 这些列在现有唯一索引中的顺序?我认为外部查询应该可以通过距离索引访问表,而内部查询只需要访问索引。
Can you try adding an index on distance and child_node_id, or change the order of these column in the existing unique index? I think it should then be possible for the outer query to access the table by the index by distance while the inner query needs only access to the index.
添加一个根节点,所有当前根节点都是该根节点的后代。然后你只需查询你的一个根的子节点即可。问题解决了。
Add ONE root node from which all your current root nodes are descended. Then you would simply query the children of your one root. Problem solved.