将多对多和一对多表示为单个非规范化视图、多对多或其他方式?
我有一个模型问题,后面是一个单独的 SO 帖子中的视图问题,我在创建它时将链接到该问题。我正在发两篇文章,这样我就可以接受两次。
现在,我们首先假设我有以下架构。这是我正在设计的真实模式的简化版本,省略了不相关的列。为了保护无辜者和我的工作,名字已被更改。
tree
具有以下属性。
tree.id
tree.dob
tree.height
tree.forestid
forestid
是 forest
表的外键,它只是聚合多个 tree
行的一种方式。它有一个 id 列和一些元数据列。
tree
和 forest
都可能有许多 treedata
行。 treedata
包含
treedata.value
treedata.treeid
treedata.forestid
treedata.treeid 和treedata.forestid 受到约束,因此两者之一必须为空。
如果tree.forestid不为null,则treedata和tree之间的关系是多对多,forest是链接器表。否则,树和树数据之间的关系是一对多。对于我的应用程序来说非常重要的是,用户能够通过 UI 将树木动态分组为森林,并为整个森林设置 treedata.value,而且还能够处理单个树木。现在,我可以想出几种方法来表示这一点。一种说法是,每棵树都有一个森林,并且是一个大小至少为1的森林。那么,关系总是多对多。另一种方法是提供一个非规范化视图,类似于
select tree.*, treedata.*
from tree, treedata
where tree.id = treedata.treeid
union
select tree.*, treedata.*,
from tree, treedata
where tree.forestid = treedata.forestid.
第三种方法是在 tree
中添加一个 forestid
列并删除 forest
完全表。沿着这条路,我发现在 tree.forestid
的正确增量方面很难获得 ACID 保证。森林也有可能包含自己的元数据。我很想听到更多的方法来表达这一点,也希望得到更有经验的数据库人员关于哪种方式更可取的意见,如果您通过引用自己的经验中的例子来解释为什么您这么认为,我会给予最高分。
对 Martin Dom 关于 TreeComposite 表的建议的回应:
感谢您的回复。我想先将 TreeComposite 的建议先酝酿一天,然后再做出回应。首先,你的方式确实模拟了我以正常形式表达的关系,所以是的,我认为你确实理解这个问题。但是,我认为我将表命名为 Tree 和 Forest 犯了一个愚蠢的错误:因为 Forest 不需要彼此递归组合。它们不是计算机科学树。它们只是树皮和树枝的树。 Parentid 模型,尽管它仍然以正常形式代表我需要的东西(它是我需要的东西的概括),并且它的优点是树木和森林现在是“同一件事”,这从表面上看是复杂性的胜利,生怕土里会乱成一团。
问题是,无论它们在我的模型中是否被称为相同的东西,它们对于我的控制器或视图来说都不是同一件事。例如,具有子节点的 TreeComposite 可能至少具有每个节点都有不同值的属性。在这种情况下,我需要在视图中使用不同的小部件来显示属性的多个值。换句话说,我需要能够将作为其父级的每个 TreeComposite 显示为单行,并且该行的外观取决于 TreeComposite 是否有子级。
因此,从模型中提取 TreeComposite 后,我要做的第一件事就是确定它是“真正的”树还是森林。当我可以直接将其以正常形式存储为树和森林时,为什么要这样做,从而使我的视图和控制器更简单而不损害我的模型?由于父子关系,搜索 TreeData 也变得复杂。我必须连接 N 个 TreeComposites 直到找到根节点,然后搜索指向根节点的 TreeData。这会破坏数据局部性。同时,如果我有一个带有 ForestID 的树,并且我想要该树的数据,那么我根本不需要查看 Forest 表。我可以直接使用外键<->外键连接到 TreeData。 (其中 Tree.ForestID = TreeData.ForestID
)。
I have a model question, followed by a view question in a separate SO post, which I will link to when I create it. I'm making two posts so I can do two acceptances.
Now, let's start by supposing that I have the following schema. This is a simplified version of a real schema I'm designing, with irrelevant columns ommitted. Names have been changed to protect the innocent, and my job.
tree
has the following attributes.
tree.id
tree.dob
tree.height
tree.forestid
forestid
is a foreign key to the forest
table, which is just a way to aggregate multiple tree
rows. It has an id
column and some metadata columns.
tree
and forest
both have potentially many treedata
rows. treedata
contains
treedata.value
treedata.treeid
treedata.forestid
treedata.treeid and treedata.forestid are constrained so that one of the two must be null.
If tree.forestid is not null, then the relationship between treedata and tree is many-to-many, and forest is the linker table. Otherwise, the relationship between tree and treedata is one to many. It is very important to my application that the user is able to group trees together into forests on the fly via the UI, and to set a treedata.value for the entire forest, but also be able to work with individual trees. Now, I can think of a couple of ways to represent this. One is to say that every tree has a forestid and is a forest of at least size 1. Then, the relationship is always many to many. Another is to provide a denormalized view along the lines of
select tree.*, treedata.*
from tree, treedata
where tree.id = treedata.treeid
union
select tree.*, treedata.*,
from tree, treedata
where tree.forestid = treedata.forestid.
Yet a third way would be to have a forestid
column in tree
and to drop the forest
table entirely. Down that path I see a difficulty in getting ACID guarantees with respect to proper incrementing of tree.forestid
. There's also the possibility that forests should contain their own metadata. I'd love to hear more ways to represent this, and also to get the opinion of more seasoned database people with respect to which way is preferable, and top marks if you explain why you think so by citing examples from your own experience.
Response to Martin Dom's suggestion of a TreeComposite table:
Thanks for your reply. I wanted to give the suggestion of a TreeComposite a day to simmer before responding. First of all, your way does model the relationship I'm expressing in normal form, so yes, I think that you do understand the question. However, I think I made a silly mistake by naming my tables Tree and Forest: because Forests don't need to be recursively composable into each other. They aren't computer science trees. They're just bark-and-twig trees. The parentid model, though it still represents what I need in normal form (it's a generalization of what I need), and though it has the advantage that Trees and Forests are "the same thing" now, which is on the surface a complexity win, I fear that in the soil it would be a tangled mess.
The problem is that whether or not they're called the same thing in my model, they're not not the same thing to my controller or view. E.G., a TreeComposite with child nodes will probably have at least attribute for which each node will have a distinct value. In that case, I need to use a different widget in my view to display multiple values for the attribute. Put another way,I need to be able to display each TreeComposite that is its own parent as a single row, and what that row looks like depends on whether the TreeComposite has children.
So the first thing I have to do after I extract a TreeComposite from my model is decide whether it's "really" a Tree or a Forest. Why do that, when I can store it in normal form as Tree and Forest directly, thereby making my View and Controllers simpler without hurting my model? Searching for TreeData is also complicated by the parent-child relationship. I have to connect through N number of TreeComposites until I find the root node, and then search for TreeData pointing at the root node. This shreds data locality. Meanwhile, if I have a Tree with a ForestID, and I want that Tree's data, I never need to look at the Forest table at all. I can do a direct foreignkey <-> foreignkey join to TreeData. (where Tree.ForestID = TreeData.ForestID
).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
复合结构怎么样?假设您有一个 TreeComposite 表,其中有一个指向另一个 TreeComposite 的 TreeComposite.ParentId 外键。 TreeData 只能引用单个 TreeComposite,因此它只能引用森林或单个树的约束得以保留。
我看到的唯一问题是你可能有多个层次的组合,这可能有意义也可能没有意义,具体取决于你试图解决的问题。
如果我正确理解您的约束,您的 TreeComposite 和 TreeData 之间的关系将通过中间表实现多对多。
使用此模型,您可以将树木和森林视为同一类型的对象,并在应用元数据时考虑到这一点。
How about a composite kind of structure? Let's say you have table TreeComposite which has a TreeComposite.ParentId foreign key that points to another TreeComposite. A TreeData can only reference a single TreeComposite so the constraint that it only reference either a forest or an individual tree is maintained.
The only problem I see is that you could have multiple levels of composition, which may or may not make sense depending on the problem you're trying to solve.
Your relationship between TreeComposite and TreeData would be many-to-many via an intermediate table, if I'm understanding your constraints properly.
Using this model you can treat a tree and a forest as though they were the same kind of object, and apply metadata with this in mind.