如何加快寻找传递闭包根节点的查询速度？

发布于 2024-10-21 04:30:25 字数 1841 浏览 2 评论 0原文

我有一个代表一棵树的历史传递闭包表。

create table TRANSITIVE_CLOSURE
  (
    CHILD_NODE_ID number not null enable,
    ANCESTOR_NODE_ID number not null enable,
    DISTANCE number not null enable,
    FROM_DATE date not null enable,
    TO_DATE date not null enable,
    constraint TRANSITIVE_CLOSURE_PK unique (CHILD_NODE_ID, ANCESTOR_NODE_ID, DISTANCE, FROM_DATE, TO_DATE)
  );

下面是一些示例数据：

CHILD_NODE_ID | ANCESTOR_NODE_ID | DISTANCE 
--------------------------------------------
1             | 1                | 0
2             | 1                | 1
2             | 2                | 0
3             | 1                | 2
3             | 2                | 1
3             | 3                | 0

不幸的是，我当前查找根节点的查询会导致全表扫描：

select *
from transitive_closure tc
where 
  distance = 0
  and not exists (
  select null
  from transitive_closure tci
  where tc.child_node_id = tci.child_node_id
    and tci.distance <> 0
);

从表面上看，它看起来并不太昂贵，但当我接近 100 万行时，这个特定的查询开始变得令人讨厌。 ..特别是当它是获取邻接树以获取遗留支持的视图的一部分时。

有没有更好的方法来找到传递闭包的根节点？我想重写所有旧的遗留代码，但我不能......所以我需要以某种方式构建邻接列表。获取除根节点之外的所有内容都很容易，那么有没有更好的方法呢？我是否以错误的方式思考这个问题？

对具有 800k 行的表的查询计划。

OPERATION                                  OBJECT_NAME        OPTIONS         COST 
SELECT STATEMENT                                                              2301 
  HASH JOIN                                                   RIGHT ANTI      2301 
    Access Predicates
      TC.CHILD_NODE_ID=TCI.CHILD_NODE_ID 
    TABLE ACCESS                           TRANSITIVE_CLOSURE FULL            961 
      Filter Predicates 
        TCI.DISTANCE = 1 
    TABLE ACCESS                           TRANSITIVE_CLOSURE FULL            962 
      Filter Predicates 
        DISTANCE=0

原文

I have a historical transitive closure table that represents a tree.

create table TRANSITIVE_CLOSURE
  (
    CHILD_NODE_ID number not null enable,
    ANCESTOR_NODE_ID number not null enable,
    DISTANCE number not null enable,
    FROM_DATE date not null enable,
    TO_DATE date not null enable,
    constraint TRANSITIVE_CLOSURE_PK unique (CHILD_NODE_ID, ANCESTOR_NODE_ID, DISTANCE, FROM_DATE, TO_DATE)
  );

Here's some sample data:

CHILD_NODE_ID | ANCESTOR_NODE_ID | DISTANCE 
--------------------------------------------
1             | 1                | 0
2             | 1                | 1
2             | 2                | 0
3             | 1                | 2
3             | 2                | 1
3             | 3                | 0

Unfortunately, my current query for finding the root node causes a full table scan:

select *
from transitive_closure tc
where 
  distance = 0
  and not exists (
  select null
  from transitive_closure tci
  where tc.child_node_id = tci.child_node_id
    and tci.distance <> 0
);

On the surface, it doesn't look too expensive, but as I approach 1 million rows, this particular query is starting to get nasty... especially when it's part of a view that grabs the adjacency tree for legacy support.

Is there a better way to find the root node of a transitive closure? I would like to rewrite all of our old legacy code, but I can't... so I need to build the adjacency list somehow. Getting everything except the root node is easy, so is there a better way? Am I thinking about this problem the wrong way?

Query plan on a table with 800k rows.

OPERATION                                  OBJECT_NAME        OPTIONS         COST 
SELECT STATEMENT                                                              2301 
  HASH JOIN                                                   RIGHT ANTI      2301 
    Access Predicates
      TC.CHILD_NODE_ID=TCI.CHILD_NODE_ID 
    TABLE ACCESS                           TRANSITIVE_CLOSURE FULL            961 
      Filter Predicates 
        TCI.DISTANCE = 1 
    TABLE ACCESS                           TRANSITIVE_CLOSURE FULL            962 
      Filter Predicates 
        DISTANCE=0

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寂寞清仓 2024-10-28 04:30:25

查询执行需要多长时间，您希望执行多长时间？（您通常不想使用成本进行调优。很少有人知道解释计划成本的真正含义。）

在我的慢速桌面上，查询 800K 行只花了 1.5 秒。然后数据存入内存后 0.5 秒。你的情况是否变得更糟，
或者这个查询会非常频繁地运行吗？

我不知道您的数据是什么样的，但我猜想全表扫描始终最适合此查询。假设您的分层数据
比较浅，即0和1的距离很多，但100的距离很少，最重要的列不会很明显。这意味着
任何距离的索引条目都将指向大量的块。使用多块读取一次读取整个表会便宜得多
而不是一次一个块地读取大量内容。

另外，你说的历史是什么意思？您可以将此查询的结果存储在物化视图中吗？

另一个可能的想法是使用解析函数。这用排序代替了第二次表扫描。这种方法通常更快，但对我来说
查询实际上需要更长的时间，5.5 秒而不是 1.5 秒。但也许它在您的环境中会做得更好。

select * from
(
    select
        max(case when distance <> 0 then 1 else 0 end)
            over (partition by child_node_id) has_non_zero_distance
        ,transitive_closure.*
    from transitive_closure
)
where distance = 0
    and has_non_zero_distance = 0;

How long does the query take to execute, and how long do you want it to take? (You usually do not want to use the cost for tuning. Very few people know what the explain plan cost really means.)

On my slow desktop the query only took 1.5 seconds for 800K rows. And then 0.5 seconds after the data was in memory. Are you getting something significantly worse,
or will this query be run very frequently?

I don't know what your data looks like, but I'd guess that a full table scan will always be best for this query. Assuming that your hierarchical data
is relatively shallow, i.e. there are many distances of 0 and 1 but very few distances of 100, the most important column will not be very distinct. This means
that any of the index entries for distance will point to a large number of blocks. It will be much cheaper to read the whole table at once using multi-block reads
than to read a large amount of it one block at a time.

Also, what do you mean by historical? Can you store the results of this query in a materialized view?

Another possible idea is to use analytic functions. This replaces the second table scan with a sort. This approach is usually faster, but for me this
query actually takes longer, 5.5 seconds instead of 1.5. But maybe it will do better in your environment.

select * from
(
    select
        max(case when distance <> 0 then 1 else 0 end)
            over (partition by child_node_id) has_non_zero_distance
        ,transitive_closure.*
    from transitive_closure
)
where distance = 0
    and has_non_zero_distance = 0;

回复收藏 0 原文