创建分层定义的数据集的扁平表/视图

发布于 2024-09-12 08:38:42 字数 947 浏览 1 评论 0原文

我有一个包含分层数据的表。目前该层次结构中有大约 8 个级别。

我真的很喜欢数据的结构方式,但是当我需要知道第 8 级记录是否是第 1 级记录的子记录时,性能很差。

我有 PL/SQL 存储函数,它们为我执行这些查找,每个有一个 select * from tbl start with ... connect by... 语句。当我查询少量记录时,这工作得很好,但我现在的情况是,我需要一次查询约 10k 条记录,并为每条记录运行此函数。它需要 2-3 分钟,而我需要它在几秒钟内运行。

根据我对当前数据的了解,使用一些启发式方法,我可以摆脱查找功能,只需执行 childrecord.key || '%' 就像parentrecord.key 但这是一个非常肮脏的黑客并且并不总是有效。

所以现在我在想,对于这个按层次结构定义的表,我需要有一个单独的父子表,其中将包含每个关系...对于从 1 级到 8 级的层次结构,将有 8 个!记录,将 1 与 2 关联,1 与 3,...,1 与 8 关联,2 与 3 关联,2 与 4,...,2 与 8 关联。依此类推。

我的想法是,我需要一个插入触发器,它将基本上运行 connect by 查询,并且对于层次结构中的每个匹配,它将在查找表中插入一条记录。为了处理旧数据,我只需为主表设置外键并进行级联删除。

还有比这更好的选择吗?我是否缺少另一种可以更快地确定这些远祖/后代关系的方法?

编辑:这似乎正是我所想的:http://evolt.org /working_with_hierarchical_data_in_sql_using_ancestor_tables

I have a table containing hierarchical data. There are currently ~8 levels in this hierarchy.

I really like the way the data is structured, but performance is dismal when I need to know if a record at level 8 is a child of a record at level 1.

I have PL/SQL stored functions which do these lookups for me, each having a select * from tbl start with ... connect by... statement. This works fine when I'm querying a handful of records, but I'm in a situation now where I need to query ~10k records at once and for each of them run this function. It's taking 2-3 minutes where I need it to run in just a few seconds.

Using some heuristics based on my knowledge of the current data, I can get rid of the lookup function and just do childrecord.key || '%' LIKE parentrecord.key but that's a really dirty hack and will not always work.

So now I'm thinking that for this hierarchically-defined table I need to have a separate parent-child table, which will contain every relationship...for a hierarchy going from level 1-8 there would be 8! records, associating 1 with 2, 1 with 3,...,1 with 8 and 2 with 3, 2 with 4,...,2 with 8. And so forth.

My thought is that I would need to have an insert trigger where it will basically run the connect by query and for every match going up the hierarchy it will insert a record in the lookup table. And to deal with old data I'll just set up foreign keys to the main table with cascading deletes.

Are there better options than this? Am I missing another way that I could determine these distant ancestor/descendant relationships more quickly?

EDIT: This appears to be exactly what I'm thinking about: http://evolt.org/working_with_hierarchical_data_in_sql_using_ancestor_tables

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

聽兲甴掵 2024-09-19 08:38:42

所以你想要的是实现传递闭包。也就是说,给定这个应用程序表……

 ID   | PARENT_ID
------+----------
    1 | 
    2 |         1
    3 |         2
    4 |         2
    5 |         4

图形表将如下所示:

 PARENT_ID | CHILD_ID
-----------+----------
         1 |        2
         1 |        3
         1 |        4
         1 |        5
         2 |        3
         2 |        4
         2 |        5
         4 |        5

可以在 Oracle 中维护这样的表,尽管您需要为其构建自己的框架。问题是这是否值得花费这些开销。如果源表不稳定,那么保持图形数据最新可能会花费比查询节省的周期更多的周期。只有您知道您的数据概况。

我认为您无法使用 CONNECT BY 查询和级联外键来维护这样的图表。间接活动太多,很难做好。此外,物化视图也已过时,因为我们无法编写 SQL 查询来在删除 ID=4 的源记录时删除 1->5 记录。

所以我建议你阅读一篇名为 在 SQL 中维护图的传递闭包,作者:Dong、Libkin、Su 和 Wong。这包含大量理论和一些粗糙的 (Oracle) SQL,但它将为您提供构建维护图形表所需的 PL/SQL 的基础。


“你能扩展一下关于它的部分吗
太难维护了
通过/级联 FK 连接?如果我控制
访问表和所有内容
插入/更新/删除通过
存储过程,有哪些类型
存在这样的情况
崩溃了吗?”

考虑记录 1->5,它是 1->2->4->5 的短路。现在会发生什么如果,正如我之前所说,我们删除 ID=4 的源记录?级联外键可以删除 2->44 的条目吗? ->5,但是在图表中留下了1->5(实际上是2->5),尽管它们没有不再表示图中的有效边

可能有效的方法(我认为,我还没有这样做)是在源表中使用额外的合成键,如下所示

 ID   | PARENT_ID | NEW_KEY
------+-----------+---------
    1 |           | AAA
    2 |         1 | BBB
    3 |         2 | CCC
    4 |         2 | DDD
    5 |         4 | EEE

。这样:

 PARENT_ID | CHILD_ID | NEW_KEY
-----------+----------+---------
         1 |        2 | BBB
         1 |        3 | CCC
         1 |        4 | DDD
         1 |        5 | DDD
         2 |        3 | CCC
         2 |        4 | DDD
         2 |        5 | DDD
         4 |        5 | DDD

因此图表有一个外键引用生成它的源表中的关系,而不是链接到 ID,然后删除 ID=4 的记录将级联删除中的所有记录。 如果任何给定 ID 只能有零个或一个父 ID,则此方法将起作用,但如果允许这种情况发生,则此

方法将不起作用:

 ID   | PARENT_ID
------+----------
    5 |         2
    5 |         4

换句话说,边 1->5 表示 1->2->4->51->2->5代码>.因此,什么可能有效取决于数据的复杂性。

So what you want is to materialize the transitive closures. That is, given this application table ...

 ID   | PARENT_ID
------+----------
    1 | 
    2 |         1
    3 |         2
    4 |         2
    5 |         4

... the graph table would look like this:

 PARENT_ID | CHILD_ID
-----------+----------
         1 |        2
         1 |        3
         1 |        4
         1 |        5
         2 |        3
         2 |        4
         2 |        5
         4 |        5

It is possible to maintain a table like this in Oracle, although you will need to roll your own framework for it. The question is whether it is worth the overhead. If the source table is volatile then keeping the graph data fresh may cost more cycles than you will save on the queries. Only you know your data's profile.

I don't think you can maintain such a graph table with CONNECT BY queries and cascading foreign keys. Too much indirect activity, too hard to get right. Also a materialized view is out, because we cannot write a SQL query which will zap the 1->5 record when we delete the source record for ID=4.

So what I suggest you read a paper called Maintaining Transitive Closure of Graphs in SQL by Dong, Libkin, Su and Wong. This contains a lot of theory and some gnarly (Oracle) SQL but it will give you the grounding to build the PL/SQL you need to maintain a graph table.


"can you expand on the part about it
being too difficult to maintain with
CONNECT BY/cascading FKs? If I control
access to the table and all
inserts/updates/deletes take place via
stored procedures, what kinds of
scenarios are there where this would
break down?"

Consider the record 1->5 which is a short-circuit of 1->2->4->5. Now what happens if, as I said before, we delete the the source record for ID=4? Cascading foreign keys could delete the entries for 2->4 and 4->5. But that leaves 1->5 (and indeed 2->5) in the graph table although they no longer represent a valid edge in the graph.

What might work (I think, I haven't done it) would be to use an additional synthetic key in the source table, like this.

 ID   | PARENT_ID | NEW_KEY
------+-----------+---------
    1 |           | AAA
    2 |         1 | BBB
    3 |         2 | CCC
    4 |         2 | DDD
    5 |         4 | EEE

Now the graph table would look like this:

 PARENT_ID | CHILD_ID | NEW_KEY
-----------+----------+---------
         1 |        2 | BBB
         1 |        3 | CCC
         1 |        4 | DDD
         1 |        5 | DDD
         2 |        3 | CCC
         2 |        4 | DDD
         2 |        5 | DDD
         4 |        5 | DDD

So the graph table has a foreign key referencing the relationship in the source table which generated it, rather than linking to the ID. Then deleting the record for ID=4 would cascade deletes of all records in the graph table where NEW_KEY=DDD.

This would work if any given ID can only have zero or one parent IDs. But it won't work if it is permissible for this to happen:

 ID   | PARENT_ID
------+----------
    5 |         2
    5 |         4

In other words the edge 1->5 represents both 1->2->4->5 and 1->2->5. So, what might work depends on the complexity of your data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文