父/子数据库包含循环引用
我有一个关键字表,其中每个关键字都分配有一个 id 并且是唯一的。我有第二个表,它将父关键字的 id 链接到子关键字的 id。一个关键字最多可以有大约 800 个子项,或者根本没有。孩子可以是更多关键字的父母(等等......)
我遇到的问题是孩子(或孙子或曾孙)可以是导致循环结构的初始关键字的父母。我正在尝试使用递归函数为初始关键字构建树数据结构,但该函数要么永远不会结束,要么超出了 Python 中的 1000 级递归限制。
是否有更好的方法来设计我的父/子表来防止这种情况发生(或在插入期间进行预先检查),或者是否有更好的方法来编写递归函数来防止这种情况发生?我试图限制递归函数的深度,但遇到了单级问题(即子级是父级的父级)。同样,我的目标是为初始关键字创建一个树结构。
Table Keyword:
id int(11) not null primary key auto_increment (id of keyword)
text varchar(255) unique (keyword text e.g. "computer help desk")
Table Keyword_Relation:
id int(11) not null primary key auto_increment (id for parent/child combo, not keyword id)
parent int(11) (id of parent keyword)
child int(11) (id of child keyword)
I have a keyword table where each keyword is assigned an id and is unique. I have a second table that links ids of parent keywords to ids of child keywords. One keyword can have up to approx 800 children or have none at all. Children can be parents of yet more keywords (and on and on...)
The issue I am running into is a child (or grandchild or great grandchild) can be the parent of the initial keyword causing a cyclical structure. I am attempting to build a tree data structure for the initial keyword using a recursive function but the function either never ends, or goes past the 1000 level recursion limit in Python.
Is there a better way to design my parent/child table to prevent this (or do upfront checking during inserts) or is there a better way to write a recursive function to prevent this condition from happening? I tried to limit the depth of the recursion function but ran into single level issues (i.e. child being the parent of the parent). Again, my goal is to create a tree structure for an initial keyword.
Table Keyword:
id int(11) not null primary key auto_increment (id of keyword)
text varchar(255) unique (keyword text e.g. "computer help desk")
Table Keyword_Relation:
id int(11) not null primary key auto_increment (id for parent/child combo, not keyword id)
parent int(11) (id of parent keyword)
child int(11) (id of child keyword)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您想要做的是创建拓扑排序。有许多已发布的方法可以最佳地执行此操作,这取决于您的架构和首选方法。
就您而言,听起来您没有多父母。
但我如何以编程方式处理它是从叶节点(即没有子节点的节点)开始并沿树向上上升。
上升时,保留遇到的节点的集合。如果你重复一次遭遇,那么就会存在循环,并且拓扑排序是不可能的。
你不会得到无限循环,但你的拓扑肯定有可能有超过 1000 个节点......所以递归对你来说可能是不可能的。
编辑:回答您关于“更好的设计”的问题......如果可能的话,存储根节点标识符可能是有利的。
也就是说:给定一个父节点、子节点、孙子节点、曾孙节点、曾曾孙节点……
每一行不仅包含其直接父节点的 ID,还包含根节点父节点 ID ...或一些“已知良好”的根节点
如果您这样做,您可以通过仅上升直到根节点来加速拓扑排序方法,并且仅包括具有相同根节点的集合。
What you are trying to do is create a topological sort. There are a number of methods published for doing this optimally and it will depend on your schema and preferred method.
In your case it sounds like you do not have multi-parent.
But how I have dealt with it programmatically is to start with the leaf nodes (that is nodes with no children) and ascend the tree.
While ascending, keep a collection of nodes you have encountered. If you ever repeat an encounter then a cycle exists and a topological sort is not possible.
You will not get an infinite loop, but it's certainly possible that your topology will have more than 1000 nodes... so recursion may not be possible for you.
EDIT: To answer your question about "better design".... If possible it may be advantageous to store the root node identifier.
That is: given a parent, child, grandchild, greatgrandchild, greatgreat....grandchild
Each row will not only contain their immediate parent's ID but also the root node Parent ID... or some "known good" root node
IF you do this you can speed up topological sort methods by only ascending up until the root node, and only include sets having the same root node.
您可以从树的顶部开始,只跟踪已经找到的节点并忽略它们。
You can start at the top of the tree, and just keep track of the nodes you've already found and ignore them.