SQL 架构:只有一个表存储多个实体类型是否合理? (使用自连接)
我很少遇到这样的情况:多个实体类型的单个表似乎比每个实体类型一个表更好。这是一个对我来说有意义的例子,但从学术上来说它似乎是错误的。
问题:我可以做到这一点并且仍然拥有“健全”的架构吗?
示例如下
假设两种实体类型:公司和个人。公司通常由一个人拥有,但有时另一家公司拥有一家公司。
坚持这一想法并加以补充,假设每个公司都有一名注册代理人,负责公司的合法创建。进一步说明,注册代理人可以是个人也可以是公司。
如果您认为公司[子]的所有者[母公司]可以是个人或公司,您可能会开始看到保持第三范式和避免冗余的挑战。
与我的示例相反,如果只有人们可以拥有公司,则所有权链接表非常传统,具有以下列:OwnershipID(有点不必要)、CorporationID、PersonID。
相反,您需要类似:OwnershipID、CorporationID、OwnerID、OwnerEntityType(公司或个人)。不要误会我的意思,您可以完成这项工作,但至少可以说它不会很有趣。
继续我给出的示例,您需要为每个公司分配一个代理。通常,代理人是业主之一(一个人)。在这种情况下,您确实希望链接回该人的一条记录。您不希望将该人记录为所有者,然后再次记录为代理(在代理表中)。那是多余的。
与该“问题”类似,注册代理人也可以是一家公司,例如律师事务所、注册会计师或商业申报公司,仅举一些典型的例子。就像代理人一样,代理人公司确实不应该拥有自己作为代理人实体的记录。相反,它需要链接回公司表中其公司存在的信息。 [除了我最终说没有 CorporationEntity 表]
就像将每个公司与其任何类型、个人或公司的所有者相匹配的链接表一样,您可以拥有一个代理链接表: AgentRepresentationID、CorporationID 、 AgentID、 AgentType...但同样,当您必须将相关代理放在一起(一些来自 Person 表,一些来自 Corporation 表)时,这会很丑陋(IMO)。
这就是为什么我说,在这种情况下,您可以看到中立实体类型如何具有优势。它会是这样的:
表:EntityAll
关键列:EntityID、EntityType(或 EntityTypeID,如果您坚持,请链接出来以获取描述)、EntityName(涉及名称和不同类型……与本文主题无关)
链接表:CorporationOwnership
关键列:OwnershipID(我的评论是,这是不必要的)、ChildEntityID(所拥有的实体;为清楚起见,命名为“Child”,我不会这样命名)ParentEntityID(父实体)
链接表:代理代表
关键列:AgentRepresentationID(...我不会说)、CorporationEntityID(所代表的公司实体)、AgentEntityID(来自实体表,相当于此处代理的记录)
虽然您可能对我的架构感到满意,您应该对链接表中的列命名感到有点困扰。这让我很烦恼。通常,这些表中的第二列和第三列名称与您在每个实体各自的表中加入的列名称完全匹配(哈哈,但每个实体没有各自的表,因此您不能让链接表列名称与源列名称,因为它们是同一列)。从技术上讲,这并不重要,但它会打破你的命名约定,这确实很重要,但还不足以不这样做。
如果我还没有足够好地把它带回家,您可以按照以下方法将其组合在一起。您可以自行加入 EntityAll 表以获得所需的内容。
列出所有军团及其所有者(在 T-SQL 中):
SELECT Corp.EntityName as CorpName, Owner.EntityName as OwnerName
FROM EntityAll as Corp
JOIN CorporationOwnership as Link on (Corp.EntityID = Link.ChildEntityID)
JOIN EntityAll as Owner on (Link.ParentEntityID = Owner.EntityID)
因此,您将执行相同的操作来返回代理,而不是所有者。
我意识到这不是我们被训练来构建表的方式,但我强烈地感觉到我的解决方案消除了冗余数据,并使编码、管理和阅读变得更容易。
PS 我最近提供了这个例子作为对 SO 的一个老问题的答案。这是一个老问题,没有对话。我需要实现这个例子,并且我很好奇这种架构的影响。
这是之前的问题/答案: 良好的数据库表设计:一个表混合不同的实体或每个实体单独的表
I rarely come across a situation where a single table for multiple entity types seems better than one table per entity type. Here's an example that makes sense to me, but academically it seems wrong.
QUESTION: Can I do this and still have a "sound" architecture?
Example Follows
Suppose two entity types, a corporation and a person. A corporation is typically owned by a person, but sometimes another corporation owns a corporation.
Holding onto that thought, and adding to it, let's say that every corporation has a registered agent attached to it who is responsible for the legal creation of the corporation. Furthering my illustration, the registered agent can be either a person or a corporation.
If you consider that the owner [parent] to the corporation [child] can be either a person or a corporation, you may begin to see the challenge in keeping the third normal form and avoiding redundancy.
In contrast to my example, if only people could own Corporations, the Ownership link table is very conventional, having columns: OwnershipID (sort of unecessary), CorporationID, PersonID.
Instead, you need something like: OwnershipID, CorporationID, OwnerID, OwnerEntityType (corp or person). Don’t get me wrong, you can make this work, but it won't be fun, to say the least.
Continuing on with the example I gave, you need to assign an agent to every Corporation. Usually, the agent is one of the owners (a person). In which case, you really do want to link back to the one record of that person. You don't want to have record of the person as an owner and then again as an agent (in an Agent table). That would be redundant.
Similarly to that "problem" a registered agent can also be a corporation, such as a law firm, a CPA, or a biz filings company, to name some typical examples. Just like the agent-person, an agent-corporation really should not get its own record as an agent-entity. Instead it needs to link back to the of its corporate existence in the Corporation table. [except that I'm ultimately saying to not have a CorporationEntity table]
Just like the link table that matched each corporation to its owner(s) of any type, person or corporation, you could have an agent link table of: AgentRepresentationID, CorporationID, AgentID, AgentType... but again, it would be ugly (IMO) when you have to pull together the related agents -- some from the Person table, some from the Corporation table.
This is why I say, in cases like this, you can see how a neutral entity type can be advantageous. It would be something like this:
Table: EntityAll
Key Columns: EntityID, EntityType (or EntityTypeID if you insist, link out to get the description), EntityName (there are concerns with names and different types... off topic to this post)
Link Table: CorporationOwnership
Key Columns: OwnershipID (again, my comment that this is kind of unecessary), ChildEntityID (the entity being owned; named "Child" for clarity, I wouldn't name it that) ParentEntityID (the parent entity)
Link Table: AgentRepresentation
Key Columns: AgentRepresentationID (...I won't say it), CorporationEntityID (the corp entity being represented), AgentEntityID (from the Entity table, equating to the record that's the agent here)
While you might be OK with my architecture, you should be a little bothered by the column naming in the link tables. It bothers me. Typically the second and third column names in those tables match exactly the name of the columns you JOIN in each entity's respective table (haha, but each entity doesn't have a respective table so you can't have the link table column names match the source column names because they are THE same column). Technically this does not matter, but it will break your naming conventions which does matter, but not enough to not do it.
In case I haven't driven it home well enough yet, here is how you'll pull it together. You JOIN the EntityAll table on its own self to get what you need.
List all Corps and their owners (in T-SQL):
SELECT Corp.EntityName as CorpName, Owner.EntityName as OwnerName
FROM EntityAll as Corp
JOIN CorporationOwnership as Link on (Corp.EntityID = Link.ChildEntityID)
JOIN EntityAll as Owner on (Link.ParentEntityID = Owner.EntityID)
Consequently, you'd do the same thing to return the agent, instead of the owner(s).
I realize this is not how we're trained to architect tables, but I feel pretty strongly that my solution eliminates redundant data AND makes it easier to code, manage and read.
P.S. I recently provided this example as an answer to an old question on SO. Being an old question, there was no dialogue. I need to implement this very example, and I’m curious about the ramifications of this architecture.
Here is the previous question/answer:
Good db table design: One table mixing different entities or separate table for each entity
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为休·达文 (Hugh Darwen) 创造了术语“分布式键”和“分布式外键”,其中单个引用的键值恰好存在于多个引用相关变量(表)之一中;这将需要“多重赋值”的相关概念,以便原子地插入到引用和引用相关变量中。
虽然理论上这可以在 SQL-92 中使用可延迟模式级 ASSERTION(或者可能支持子查询的 CHECK 约束)来实现,但这是一个相当笨拙的过程,是程序性的(而不是基于集合)并且没有任何 SQL 产品曾经支持此功能(或者我怀疑永远不会)。
对于可用的 SQL 产品,我们能做的最好的事情就是使用复合键
(entity_ID,entity_type)
,并在引用时对entity_type
施加CHECK
约束例如,这种超类-子类模式在 SQL 中非常常见。
理想情况下,表名称应反映整个集合的性质。你们可能需要超越其他集合名称的组合来思考;也许询问特定业务领域的专家,例如会计师可能使用术语“工资单”而不是“员工薪水”。
另一个理想的情况是列的名称在整个架构中保持相同,但使用子类化方法时,您经常需要限定它们(这让我很烦恼!)例如,
注意我会为
agent_person_ID
例如因为对实体类型没有限制。原则上,我觉得为整个架构中的所有引用保留两列复合键会更好,而且我发现在实践中我经常不需要现在的实体类型,所以这个 SQL DDL 正在保存一个
JOIN
在 SQL DML 中:)I think it was Hugh Darwen who coined the terms 'distributed key' and 'distributed foreign keys', where a single referenced key value exists in exactly one of multiple referencing relvars (tables); this would require a related concept of 'multple assignment' in order to atomically insert to both the referenced and referencing relvars.
While this could in theory be achieved in SQL-92 using deferrable schema-level
ASSERTION
s (or perhapsCHECK
constraints that support subqueries), it's rather a clunky process, is procedural (rather than set-based) and there isn't a SQL product that has ever support this functionality (or ever will, I susepct).The best we can do with available SQL products is to use a compound key
(entity_ID, entity_type)
with aCHECK
constraint on theentity_type
in referencing tables to ensure there is no more than one referencing key value (note this is not the same as 'exactly one referencing key value') e.g.This superclass-subclass pattern is very common in SQL.
Ideally, a table name should reflect the nature of the set as a whole. You many need to think beyond a compound of other sets' names; perhaps ask a expert in the particular field of business e.g. an accountant may use the term 'payroll' rather than 'EmployeesSalaries'.
Another ideal is for a column's name to remain the same throughout the schema but with a subclassing approach you often need to qualify them (and this bothers me!) e.g.
Note I would have used a single column key for
agent_person_ID
e.g.because there is no restriction on entity type. In principle I feel better about retaining the two-column compound key for all references throughout the schema and I find in practice as often as not I need to now the entity type anyhow so this SQL DDL is saving a
JOIN
in SQL DML :)查找“泛化专业化关系建模”。
我认为有一种实体,我称之为“法人”。你所说的“人”,有些人可能称之为“自然人”,是一种特殊的法人。你所说的“公司”,有些人可能称之为“法人”,是一种不同类型的专门法人。
这样看来,“法人”、“人”和“公司”之间的关系可以被视为一种“gen-spec”(广义-专业化)模式。 gen-spec 在对象建模教程中得到了很多处理,并且非常自然地符合继承的概念。 gen-spec 在关系建模教程中经常被掩盖。但这个概念很好理解。
法人可以拥有一家公司,无论法人属于哪种专业类型。
您的 ENTITYALL 表符合 gen-spec 关系设计的一些功能,但您可以进一步开发该模型。特别是,如果我们有一个实体类型“automobile”,则没有特殊原因无法在 ENTITYALL 表中获得条目。但汽车不能拥有一家公司的事实现在已被掩盖。我想要某种将“人”和“公司”概括为“法人”的表格,但又不那么笼统以至于将“汽车”归类为“法人”。 ENTITYALL 对于我的喜好来说太通用了。
看看 gen-spec 的最佳示例,我们发现 gen 表的主键和每个 spec 表的主键都来自同一域。此外,除了维护专用实体的实体完整性之外,spec 表的主键还充当对 gen 表的外键引用。事实证明连接非常好。您的模式可以从这个设计技巧中受益。
Look up "generalization specialization relational modeling".
I think that there is a type of entity that I'll call "legal person". What you have called "person", and some might call "natural person" is a specialized kind of legal person. What you have called "corporation", and some might called "incorporated person", is a different kind of specialized legal person.
Seen this way, the relationship between "legal persons", "persons", and "corporations" can be seen as a gen-spec (generalization-specialization) pattern. The gen-spec gets a lot of treatment in the tutorials on object modeling, and fits pretty naturally with the concept of inheritance. gen-spec is often glossed over in tutorials on relational modeling. But the concept is well understood.
A legal person can own a corporation, regardless of which specialized type of legal person.
Your ENTITYALL table conforms to some of the features of a gen-spec relational design, but you could develop the model further. In particular, if we have an entity type "automobile", there's no particular reason why this wouldn't get an entry in an ENTITYALL table. But the fact that an automobile cannot own a corporation has now been obscured. I would want some kind of table that generalizes "person" and "corporation" into "legal person", but isn't so general that "automobile" would be classified as a "legal person". ENTITYALL is too generic for my preferences.
Looking at the best examples of gen-spec out there, we see that the primary key for the gen table and the primary key for each spec table are all drawn from the same domain. Further, the primary key of the spec tables operates as a foreign key reference to the gen table, in addition to maintaining entity integrity for the specialized entity. The joins turn out to be very nice. Your schema could profit from this design tid bit.
请记住,公司拥有人民的权利,并且两者都推断出某种形式的父母关系......如果业务规则是只能有一个父级,您可以使用自引用外键来指示父级。如果公司和个人被视为同一实体,则以下内容更容易解释:
ENTITY 表
entity_id
(主键)entity_type_code
(ENTITY_TYPE_CODE.entity_type_code 的外键)
,包含“person”和“corporation”等)parent_entity_id
(与ENTITY.entity_id
的外键关系,可为空)parent_entity_id
也必须可为空,因为 NULL 表示层次结构的根实体。但这也意味着使用具有分层查询支持的数据库(IE:不是 MySQL)。OWNERSHIP 表
如果业务规则需要支持一个实体的多个父级(反之亦然),则可以采用
OWNERSHIP
表。名称有很多,但它只是一个仅用于支持多对多关系的表。您是对的,如果业务规则需要它,它就是理想的方法。结论
数据建模的关键是基于业务规则构建模型。该模型不应经常更改,因此请尽力确保面向未来——业务分析师是完成此任务的关键。
Keeping in mind that Corporations have the rights of people, and that both infer a parental relationship of some format... If the business rule is that there can be only be one parent, you could use a self-referential foreign key to indicate the parent. The following is easier to explain if a Corporation and a person are considered the same entity:
ENTITY table
entity_id
(primary key)entity_type_code
(foreign key toENTITY_TYPE_CODE.entity_type_code
, containing "person" and "corporation", etc)parent_entity_id
(foreign key relationship withENTITY.entity_id
, nullable)The
parent_entity_id
also has to be nullable, because NULL indicates the root entity for a hierarchy. But this also means using a database that has hierarchical query support (IE: not MySQL).OWNERSHIP table
The
OWNERSHIP
table would be the approach to take if the business rules need to support more than one parent to an entity, or vice versa. There are numerous names, but it's a table solely for supporting many-to-many relationships. You are correct, it is the ideal approach if the business rules require it.Conclusion
The key to data modeling is constructing a model based on the business rules. The model shouldn't change often, so try to futureproof what you can--business analysts are key to getting this done.