正确的数据结构来维持一对多关系
我想在数据库中表示文档。有几种不同类型的文档。所有文档都有某些共同点,但并非所有文档都是相同的。
例如,假设我有一个基本的文档表...
TABLE docs (
ID
title
content
)
现在假设我有一个可以属于用户的文档子集,并且可以具有与其关联的附加信息。我可以执行以下操作...
TABLE docs (
ID
userID -> users(ID)
title
content
additionalInfo
)
...但是,这将导致表中出现大量空值,因为只有某些文档可以属于用户,而不是全部。因此,我创建了第二个表“ownedDocs”来扩展“docs”:
TABLE ownedDocs (
docID -> docs(ID)
userID -> users(ID)
additionalInfo
)
我想知道:这是正确的方法吗? (我很担心,因为虽然所有内容都在一个表中,但我在文档和用户之间具有一对多关系。但是,通过创建一个新表ownedDocs,数据结构看起来像我在文档之间具有多对多关系和用户 - 这永远不会发生。)
提前感谢您的帮助
I want to represent documents in a database. There are several different types of documents. All documents have certain things in common, but not all documents are the same.
For example, let's say I have a basic table for documents...
TABLE docs (
ID
title
content
)
Now let's say I have a subset of documents that can belong to a user, and that can have additional info associated with them. I could do the following...
TABLE docs (
ID
userID -> users(ID)
title
content
additionalInfo
)
...however this will result in a lot of null values in the table, as only some documents can belong to a user, not all. So instead I have created a second table "ownedDocs" to extend "docs":
TABLE ownedDocs (
docID -> docs(ID)
userID -> users(ID)
additionalInfo
)
I am wondering: Is this the right way to do it? (I am worried because while everything is in one table, I have a one-to-many relationship between docs and users. However, by creating a new table ownedDocs, the datastructure looks like I have a many-to-many relationship between docs and users - which will never occur.)
Thanks in advance for your help
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
明白了,那么您需要在
docs
表中拥有 userid 以确保一对多关系(一个用户,可能有多个文档)。如果文档未与特定用户关联(如
userid
列为空),我不认为附加信息列为空有什么危害。将附加信息拆分到另一个表仍然意味着一对一的关系,因此您最好使用一个包含文档、用户和附加信息的表......Understood, then you need to have the userid in the
docs
table to ensure a one-to-many relationship (one user, potentially many documents).I don't see the harm in additional info columns being null if a document is not associated to a particular user, noted by the
userid
column being null. Splitting the additional info to another table still means a one-to-one relationship, so you're best off using one table with doc, user and additional info...如果您将
OwnedDocs.DocId
作为主键,那么很明显 1:N 关系是不可能的。零或一对一关系的建模是如果我们只有一个子类型,那么具有 NULL 列的单个表是一种合理的方法,但最好确保仅在适当的情况下填充子类型属性。检查约束来强制执行此规则:
或者甚至可能是此规则:
属性之间的关系不会显示在 ERD 中(除非您使用命名约定),当然,
AdditionalInfo
的强制性质。在第二种情况下,拥有的文档不会很明显。一旦我们有几个这样的子类型,单独的表的情况就变得引人注目,特别是如果子类型构成一个弧,例如文档可以是 FinancialDocument 或 MedicalDocument 或 PersonnelDocument但不超过一个类别,我曾经使用带有大量空列、视图和检查约束的单个表实现了这样的模型。太可怕了。子类型表绝对是最佳选择。
If you make
OwnedDocs.DocId
the primary key it will be quite clear that a 1:N relationship is impossible.The modelling of zero or one to one relationships is tricky. If we have just the one sub-type then the single table with NULL columns is a reasonable approach. However it is good practice to ensure that the sub-types attributes are only populated when appropriate. In the given example that would mean a check constraint to enforce this rule:
Or maybe even this rule:
The relationship between attributes won't show up in an ERD (unless you use a naming convention). For sure, the mandatory nature of
AdditionalInfo
for owned documents won't be obvious in the second case.Once we have several such sub-types the case for separate tables becomes compelling, especially if the sub-types constitute an arc e.g. a Document can be a FinancialDocument or a MedicalDocument or a PersonnelDocument but not more than one category. I once implemented such a model using a single table with lots of null columns, views and check constraints. It was horrible. Sub-type tables are definitely the way to go.
这取决于您想要实现的规范化级别。通常,根据您提供的描述,我会像这样构建我的数据库:
如果我错了,有人纠正我,但这应该是第三范式。这使结构保持良好和干净,并且仅使用按预期存储数据所需的位。您可以独立存储所有元素,但在需要时相关联。
根据附加信息的性质,您需要进行一些更改。例如,附加信息是否始终对应于用户?如果文档与用户关联,是否总是会提供它?如果是这样,那么您可以将其添加到 users_docs 表中。但这至少应该向您展示标准化。
It depends on the level of normalization you want to achieve. Typically, based on the description you are providing, I would structure my DB like so:
Someone correct me if I am wrong, but this should be 3rd normal form. This keeps the structure nice and clean and uses only the bits required to store the data as expected. You can store all elements independently but related where needed.
Depending on the nature of the additional information, you need to make some changes. For example, will the additional info ALWAYS correspond to a user? Will it always be supplied if the doc is associated to a user? If so, then you can add it to the users_docs table. But this should at least show you the normalization.