Oracle对非分层数据的分层查询

发布于 2024-12-08 10:58:23 字数 2184 浏览 0 评论 0原文

我在 Oracle 表中拥有数据,该表被组织为可以包含循环的图形(请参阅示例)。

     CREATE TABLE T (parent INTEGER, child INTEGER)
               AS select 1 parent, 2 child from dual
        union all select 1 parent, 8 child from dual
        union all select 2 parent, 3 child from dual
        union all select 2 parent, 4 child from dual
        union all select 2 parent, 8 child from dual
        union all select 3 parent, 4 child from dual
        union all select 3 parent, 6 child from dual
        union all select 4 parent, 5 child from dual
        union all select 5 parent, 8 child from dual
        union all select 6 parent, 5 child from dual
        union all select 7 parent, 3 child from dual
        union all select 7 parent, 5 child from dual
        union all select 8 parent, 6 child from dual

Data Sample

我的目标是获取以下节点的所有后代节点(子节点、子节点的子节点等)节点 X。假设为 2。我的预期结果是:3,4,5,6,8。

我知道我可以设计这样的查询:

SELECT child, sys_connect_by_path(child,'/')
   FROM T
  START WITH parent = 2
CONNECT BY NOCYCLE PRIOR child = PARENT;

这样的查询的问题是它将遍历所有可能的路径,直到它们循环为止,并且有方法我的实际数据中太多了。结果由许多重复项组成 - 如下:

child | sys_connect_by_path (for information)
3     | /3
4     | /3/4
5     | /3/4/5
8     | /3/4/5/8
6     | /3/4/5/8/6
6     | /3/6
5     | /3/6/5
8     | /3/6/5/8
4     | /4
5     | /4/5
8     | /4/5/8
6     | /4/5/8/6
8     | /8
6     | /8/6
5     | /8/6/5

我的实际数据要复杂得多。执行这样一个查询的成本是如此之大,以至于我的可自动扩展的 TEMP 表空间达到了 10Gb(最初为 500 Mb),而且我的数据库实际上由于磁盘已满而崩溃了。

我尝试设计这样的查询(递归WITH子句):

WITH descendants(node) AS
( SELECT 2 node FROM dual
  UNION ALL
  (
  SELECT child
    FROM T
   INNER JOIN descendants D
      ON T.parent = D.node
   MINUS SELECT node FROM descendants
  )
)
SELECT * FROM descendants

我遇到的问题是:

  • 对于Oracle 10g,这没有实现(ORA-32033:不支持的列别名,并且一些客户使用Oracle 9 或 10),
  • 使用 Oracle 11g,我得到ORA-32041:递归WITH 子句中的 UNION ALL 操作必须只有两个分支。如果删除 MINUS 子句,我将得到循环(ORA-32044:执行递归WITH查询时检测到循环)。

您将如何查询我的原始数据以有效地获取节点 3、4、5、6、8? PL/SQL 解决方案也受到欢迎。

谢谢。

I hava data in an Oracle table that is organized as a graph that can contain cycles (see example).

     CREATE TABLE T (parent INTEGER, child INTEGER)
               AS select 1 parent, 2 child from dual
        union all select 1 parent, 8 child from dual
        union all select 2 parent, 3 child from dual
        union all select 2 parent, 4 child from dual
        union all select 2 parent, 8 child from dual
        union all select 3 parent, 4 child from dual
        union all select 3 parent, 6 child from dual
        union all select 4 parent, 5 child from dual
        union all select 5 parent, 8 child from dual
        union all select 6 parent, 5 child from dual
        union all select 7 parent, 3 child from dual
        union all select 7 parent, 5 child from dual
        union all select 8 parent, 6 child from dual

Data sample

My goal is to get all nodes that are descendants (children, children of children, etc.) of node X. Let's say 2. My expected result is then: 3, 4, 5, 6, 8.

I know that I can design a query like this:

SELECT child, sys_connect_by_path(child,'/')
   FROM T
  START WITH parent = 2
CONNECT BY NOCYCLE PRIOR child = PARENT;

The problem with such a query is that it will go through all possible paths until they cycle, and there are way too many of them in my actual data. The result consists of many duplicates – Here it is:

child | sys_connect_by_path (for information)
3     | /3
4     | /3/4
5     | /3/4/5
8     | /3/4/5/8
6     | /3/4/5/8/6
6     | /3/6
5     | /3/6/5
8     | /3/6/5/8
4     | /4
5     | /4/5
8     | /4/5/8
6     | /4/5/8/6
8     | /8
6     | /8/6
5     | /8/6/5

My actual data is much more complex. the cost of execution of such a query is so huge that my TEMP tablespace, which was autoextendable, reached 10Gb (originally 500 Mb) and my database actually broke because of disk full.

I tried to design the query like this (recursive WITH clause) :

WITH descendants(node) AS
( SELECT 2 node FROM dual
  UNION ALL
  (
  SELECT child
    FROM T
   INNER JOIN descendants D
      ON T.parent = D.node
   MINUS SELECT node FROM descendants
  )
)
SELECT * FROM descendants

The problem that I encounter is:

  • with Oracle 10g, this is not implemented (ORA-32033: unsupported column aliasing, and some customers use Oracle 9 or 10),
  • with Oracle 11g, I get ORA-32041: UNION ALL operation in recursive WITH clause must have only two branches. If I remove the MINUS clause I will get cycles (ORA-32044: cycle detected while executing recursive WITH query).

How would you query my original data to get those nodes 3, 4, 5, 6, 8 efficiently? PL/SQL solutions are also welcome.

Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

逆夏时光 2024-12-15 10:58:23

您期望到达任何子节点的最大深度是多少?

如果它相对较小,您可以循环下来,同时检查您已经访问过的节点,以类似这样的方式...

(注意,我不是 Oracle 专家,所以这更接近带有一点真正 SQL 的伪代码混合)

CREATE TABLE myMap (parent INT, child INT);

INSERT INTO myTable SELECT NULL, 2 FROM DUAL;

WHILE (SQL%ROWCOUNT > 0)
LOOP

  INSERT INTO
    myMap
  SELECT DISTINCT
    dataMap.parent,
    dataMap.child
  FROM
    myMap
  INNER JOIN
    dataMap
      ON myMap.child = dataMap.parent
  WHERE
    NOT EXISTS (SELECT * FROM myMap WHERE parent = dataMap.parent)

END LOOP;

根据性能,您可能还需要 myMap 中的 深度 字段;优化连接,以便仅连接最近的节点。这意味着两个索引;一种用于 JOIN (深度),另一种用于 NOT EXISTS (父级)

编辑

添加了 DISTINCT 关键字,以避免出现以下情况...
- 节点 2 映射到 3 和 4
- 节点 3 和 4 都映射到节点 5
- 节点 5 的所有子节点现在将被处理两次

GROUP BY 或许多其他选项,可用于满足此目的,而不是 DISTINCT。只是 NOT EXISTS 本身是不够的。

What is your expected maximum depth to reach any child node?

If it's relatively small, you could loop down, while checking for nodes you have already visited, in a manner something like this...

(Note, I'm not an Oracle expert so this is closer to pseudo code with a little real SQL mixed in)

CREATE TABLE myMap (parent INT, child INT);

INSERT INTO myTable SELECT NULL, 2 FROM DUAL;

WHILE (SQL%ROWCOUNT > 0)
LOOP

  INSERT INTO
    myMap
  SELECT DISTINCT
    dataMap.parent,
    dataMap.child
  FROM
    myMap
  INNER JOIN
    dataMap
      ON myMap.child = dataMap.parent
  WHERE
    NOT EXISTS (SELECT * FROM myMap WHERE parent = dataMap.parent)

END LOOP;

Depending on performance, you may also want a depth field in myMap; optimising the join so as to only join on the most recent nodes. This would imply two indexes; one for the JOIN (depth) and one for the NOT EXISTS (parent).

EDIT

Added the DISTINCT key word, to avoid the following case...
- Node 2 maps to 3 and 4
- Nodes 3 and 4 both map to node 5
- All children of node 5 would now be processed twice

GROUP BY, or many other options, can be used to cater for this instead of DISTINCT. It's just that the NOT EXISTS on it's own is not sufficient.

千紇 2024-12-15 10:58:23

我自己没有使用过这个,但是使用 NOCYCLE 选项的 CONNECT BY 怎么样?当它看到循环时应该停止遍历树。 Oracle 11i 肯定有这个,我认为它是在 Oracle 10g 时期的某个时候出现的。

I have not worked with this myself, but what about a CONNECT BY with the NOCYCLE option? That should stop travering the tree when it sees a loop. Oracle 11i definitely has that, I think it came in somewhere in the Oracle 10g period.

哆啦不做梦 2024-12-15 10:58:23

这可能会有所帮助,直到访问量超过 4000 字节。循环应该是不可能的,但这条线只是作为一个例子。

   WITH descendants(node, lvl, pth, visited) AS
    (
    SELECT child node, 1, cast(child as varchar2(4000)), '/'||listagg(child,'/') within group (order by child) over()||'/'
      FROM t 
     where parent = 2
     UNION ALL
    SELECT child, lvl+1, pth||'/'||child, D.visited||listagg(child,'/') within group (order by child) over()||'/'
      FROM T
     INNER JOIN descendants D
        ON T.parent = D.node
     WHERE D.visited not like '%/'||child||'/%'
    )
    cycle node set cyc to '1' default '0' 
    SELECT distinct node
      FROM descendants
     order by node
    ;

This might help until visited exceeds 4000 bytes. Cycles should not be possible but the line is there just as an example.

   WITH descendants(node, lvl, pth, visited) AS
    (
    SELECT child node, 1, cast(child as varchar2(4000)), '/'||listagg(child,'/') within group (order by child) over()||'/'
      FROM t 
     where parent = 2
     UNION ALL
    SELECT child, lvl+1, pth||'/'||child, D.visited||listagg(child,'/') within group (order by child) over()||'/'
      FROM T
     INNER JOIN descendants D
        ON T.parent = D.node
     WHERE D.visited not like '%/'||child||'/%'
    )
    cycle node set cyc to '1' default '0' 
    SELECT distinct node
      FROM descendants
     order by node
    ;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文