当前位置：文江博客话题详情

SQL 架构：只有一个表存储多个实体类型是否合理？（使用自连接）

发布于 2024-09-25 08:17:19 字数 2245 浏览 6 评论 0原文

我很少遇到这样的情况：多个实体类型的单个表似乎比每个实体类型一个表更好。这是一个对我来说有意义的例子，但从学术上来说它似乎是错误的。

问题：我可以做到这一点并且仍然拥有“健全”的架构吗？

示例如下

假设两种实体类型：公司和个人。公司通常由一个人拥有，但有时另一家公司拥有一家公司。

坚持这一想法并加以补充，假设每个公司都有一名注册代理人，负责公司的合法创建。进一步说明，注册代理人可以是个人也可以是公司。

如果您认为公司[子]的所有者[母公司]可以是个人或公司，您可能会开始看到保持第三范式和避免冗余的挑战。

与我的示例相反，如果只有人们可以拥有公司，则所有权链接表非常传统，具有以下列：OwnershipID（有点不必要）、CorporationID、PersonID。

相反，您需要类似：OwnershipID、CorporationID、OwnerID、OwnerEntityType（公司或个人）。不要误会我的意思，您可以完成这项工作，但至少可以说它不会很有趣。

继续我给出的示例，您需要为每个公司分配一个代理。通常，代理人是业主之一（一个人）。在这种情况下，您确实希望链接回该人的一条记录。您不希望将该人记录为所有者，然后再次记录为代理（在代理表中）。那是多余的。

与该“问题”类似，注册代理人也可以是一家公司，例如律师事务所、注册会计师或商业申报公司，仅举一些典型的例子。就像代理人一样，代理人公司确实不应该拥有自己作为代理人实体的记录。相反，它需要链接回公司表中其公司存在的信息。 [除了我最终说没有 CorporationEntity 表]

就像将每个公司与其任何类型、个人或公司的所有者相匹配的链接表一样，您可以拥有一个代理链接表： AgentRepresentationID、CorporationID 、 AgentID、 AgentType...但同样，当您必须将相关代理放在一起（一些来自 Person 表，一些来自 Corporation 表）时，这会很丑陋（IMO）。

这就是为什么我说，在这种情况下，您可以看到中立实体类型如何具有优势。它会是这样的：

表：EntityAll
关键列：EntityID、EntityType（或 EntityTypeID，如果您坚持，请链接出来以获取描述）、EntityName（涉及名称和不同类型……与本文主题无关）

链接表：CorporationOwnership
关键列：OwnershipID（我的评论是，这是不必要的）、ChildEntityID（所拥有的实体；为清楚起见，命名为“Child”，我不会这样命名）ParentEntityID（父实体）

链接表：代理代表
关键列：AgentRepresentationID（...我不会说）、CorporationEntityID（所代表的公司实体）、AgentEntityID（来自实体表，相当于此处代理的记录）

虽然您可能对我的架构感到满意，您应该对链接表中的列命名感到有点困扰。这让我很烦恼。通常，这些表中的第二列和第三列名称与您在每个实体各自的表中加入的列名称完全匹配（哈哈，但每个实体没有各自的表，因此您不能让链接表列名称与源列名称，因为它们是同一列）。从技术上讲，这并不重要，但它会打破你的命名约定，这确实很重要，但还不足以不这样做。

如果我还没有足够好地把它带回家，您可以按照以下方法将其组合在一起。您可以自行加入 EntityAll 表以获得所需的内容。

列出所有军团及其所有者（在 T-SQL 中）：

SELECT Corp.EntityName as CorpName, Owner.EntityName as OwnerName
FROM EntityAll as Corp
JOIN CorporationOwnership as Link on (Corp.EntityID = Link.ChildEntityID)
JOIN EntityAll as Owner on (Link.ParentEntityID = Owner.EntityID)

因此，您将执行相同的操作来返回代理，而不是所有者。

我意识到这不是我们被训练来构建表的方式，但我强烈地感觉到我的解决方案消除了冗余数据，并使编码、管理和阅读变得更容易。

PS 我最近提供了这个例子作为对 SO 的一个老问题的答案。这是一个老问题，没有对话。我需要实现这个例子，并且我很好奇这种架构的影响。

这是之前的问题/答案：良好的数据库表设计：一个表混合不同的实体或每个实体单独的表

原文

I rarely come across a situation where a single table for multiple entity types seems better than one table per entity type. Here's an example that makes sense to me, but academically it seems wrong.

QUESTION: Can I do this and still have a "sound" architecture?

Example Follows

Suppose two entity types, a corporation and a person. A corporation is typically owned by a person, but sometimes another corporation owns a corporation.

Holding onto that thought, and adding to it, let's say that every corporation has a registered agent attached to it who is responsible for the legal creation of the corporation. Furthering my illustration, the registered agent can be either a person or a corporation.

If you consider that the owner [parent] to the corporation [child] can be either a person or a corporation, you may begin to see the challenge in keeping the third normal form and avoiding redundancy.

In contrast to my example, if only people could own Corporations, the Ownership link table is very conventional, having columns: OwnershipID (sort of unecessary), CorporationID, PersonID.

Instead, you need something like: OwnershipID, CorporationID, OwnerID, OwnerEntityType (corp or person). Don’t get me wrong, you can make this work, but it won't be fun, to say the least.

Continuing on with the example I gave, you need to assign an agent to every Corporation. Usually, the agent is one of the owners (a person). In which case, you really do want to link back to the one record of that person. You don't want to have record of the person as an owner and then again as an agent (in an Agent table). That would be redundant.

Similarly to that "problem" a registered agent can also be a corporation, such as a law firm, a CPA, or a biz filings company, to name some typical examples. Just like the agent-person, an agent-corporation really should not get its own record as an agent-entity. Instead it needs to link back to the of its corporate existence in the Corporation table. [except that I'm ultimately saying to not have a CorporationEntity table]

Just like the link table that matched each corporation to its owner(s) of any type, person or corporation, you could have an agent link table of: AgentRepresentationID, CorporationID, AgentID, AgentType... but again, it would be ugly (IMO) when you have to pull together the related agents -- some from the Person table, some from the Corporation table.

This is why I say, in cases like this, you can see how a neutral entity type can be advantageous. It would be something like this:

Table: EntityAll
Key Columns: EntityID, EntityType (or EntityTypeID if you insist, link out to get the description), EntityName (there are concerns with names and different types... off topic to this post)

Link Table: CorporationOwnership
Key Columns: OwnershipID (again, my comment that this is kind of unecessary), ChildEntityID (the entity being owned; named "Child" for clarity, I wouldn't name it that) ParentEntityID (the parent entity)

Link Table: AgentRepresentation
Key Columns: AgentRepresentationID (...I won't say it), CorporationEntityID (the corp entity being represented), AgentEntityID (from the Entity table, equating to the record that's the agent here)

While you might be OK with my architecture, you should be a little bothered by the column naming in the link tables. It bothers me. Typically the second and third column names in those tables match exactly the name of the columns you JOIN in each entity's respective table (haha, but each entity doesn't have a respective table so you can't have the link table column names match the source column names because they are THE same column). Technically this does not matter, but it will break your naming conventions which does matter, but not enough to not do it.

In case I haven't driven it home well enough yet, here is how you'll pull it together. You JOIN the EntityAll table on its own self to get what you need.

List all Corps and their owners (in T-SQL):

SELECT Corp.EntityName as CorpName, Owner.EntityName as OwnerName
FROM EntityAll as Corp
JOIN CorporationOwnership as Link on (Corp.EntityID = Link.ChildEntityID)
JOIN EntityAll as Owner on (Link.ParentEntityID = Owner.EntityID)

Consequently, you'd do the same thing to return the agent, instead of the owner(s).

I realize this is not how we're trained to architect tables, but I feel pretty strongly that my solution eliminates redundant data AND makes it easier to code, manage and read.

P.S. I recently provided this example as an answer to an old question on SO. Being an old question, there was no dialogue. I need to implement this very example, and I’m curious about the ramifications of this architecture.

Here is the previous question/answer:
Good db table design: One table mixing different entities or separate table for each entity

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

赠我空喜 2024-10-02 08:17:19

我认为休·达文 (Hugh Darwen) 创造了术语“分布式键”和“分布式外键”，其中单个引用的键值恰好存在于多个引用相关变量（表）之一中；这将需要“多重赋值”的相关概念，以便原子地插入到引用和引用相关变量中。

虽然理论上这可以在 SQL-92 中使用可延迟模式级 ASSERTION（或者可能支持子查询的 CHECK 约束）来实现，但这是一个相当笨拙的过程，是程序性的（而不是基于集合）并且没有任何 SQL 产品曾经支持此功能（或者我怀疑永远不会）。

对于可用的 SQL 产品，我们能做的最好的事情就是使用复合键 (entity_ID,entity_type)，并在引用时对 entity_type 施加 CHECK 约束例如，

CREATE TABLE LegalPersons
(
 person_ID INTEGER IDENTITY NOT NULL UNIQUE, 
 person_type VARCHAR(14) NOT NULL
    CHECK (person_type IN ('Company', 'Natural Person')), 
 UNIQUE (person_type, person_ID)
);

CREATE TABLE Companies
(
 person_ID INTEGER NOT NULL UNIQUE, 
 person_type VARCHAR(14) NOT NULL
    CHECK (person_type = 'Company'), 
 FOREIGN KEY (person_type, person_ID)
    REFERENCES LegalPersons (person_type, person_ID), 
 companies_house_registered_number VARCHAR(8) NOT NULL UNIQUE
 -- other company columns and constraints here
);

CREATE TABLE NaturalPersons
(
 person_ID INTEGER NOT NULL UNIQUE, 
 person_type VARCHAR(14) NOT NULL
    CHECK (person_type = 'Natural Person'), 
 FOREIGN KEY (person_type, person_ID)
    REFERENCES LegalPersons (person_type, person_ID) 
 -- natural person columns and constraints here
);

这种超类-子类模式在 SQL 中非常常见。

理想情况下，表名称应反映整个集合的性质。你们可能需要超越其他集合名称的组合来思考；也许询问特定业务领域的专家，例如会计师可能使用术语“工资单”而不是“员工薪水”。

另一个理想的情况是列的名称在整个架构中保持相同，但使用子类化方法时，您经常需要限定它们（这让我很烦恼！）例如，

CREATE TABLE CompanyAgents
(
 company_person_ID INTEGER NOT NULL UNIQUE, 
 company_person_type VARCHAR(14) NOT NULL
    CHECK (company_person_type = 'Company'), 
 FOREIGN KEY (company_person_type, company_person_ID)
    REFERENCES LegalPersons (person_type, person_ID), 
 agent_person_ID INTEGER NOT NULL, 
 agent_person_type VARCHAR(14) NOT NULL, 
 FOREIGN KEY (agent_person_type, agent_person_ID)
    REFERENCES LegalPersons (person_type, person_ID), 
 CHECK (company_person_ID <> agent_person_ID)
);

注意我会为 agent_person_ID 例如

 agent_person_ID INTEGER NOT NULL
    REFERENCES LegalPersons (person_ID)

因为对实体类型没有限制。原则上，我觉得为整个架构中的所有引用保留两列复合键会更好，而且我发现在实践中我经常不需要现在的实体类型，所以这个 SQL DDL 正在保存一个 JOIN 在 SQL DML 中:)

I think it was Hugh Darwen who coined the terms 'distributed key' and 'distributed foreign keys', where a single referenced key value exists in exactly one of multiple referencing relvars (tables); this would require a related concept of 'multple assignment' in order to atomically insert to both the referenced and referencing relvars.

While this could in theory be achieved in SQL-92 using deferrable schema-level ASSERTIONs (or perhaps CHECK constraints that support subqueries), it's rather a clunky process, is procedural (rather than set-based) and there isn't a SQL product that has ever support this functionality (or ever will, I susepct).

The best we can do with available SQL products is to use a compound key (entity_ID, entity_type) with a CHECK constraint on the entity_type in referencing tables to ensure there is no more than one referencing key value (note this is not the same as 'exactly one referencing key value') e.g.

CREATE TABLE LegalPersons
(
 person_ID INTEGER IDENTITY NOT NULL UNIQUE, 
 person_type VARCHAR(14) NOT NULL
    CHECK (person_type IN ('Company', 'Natural Person')), 
 UNIQUE (person_type, person_ID)
);

CREATE TABLE Companies
(
 person_ID INTEGER NOT NULL UNIQUE, 
 person_type VARCHAR(14) NOT NULL
    CHECK (person_type = 'Company'), 
 FOREIGN KEY (person_type, person_ID)
    REFERENCES LegalPersons (person_type, person_ID), 
 companies_house_registered_number VARCHAR(8) NOT NULL UNIQUE
 -- other company columns and constraints here
);

CREATE TABLE NaturalPersons
(
 person_ID INTEGER NOT NULL UNIQUE, 
 person_type VARCHAR(14) NOT NULL
    CHECK (person_type = 'Natural Person'), 
 FOREIGN KEY (person_type, person_ID)
    REFERENCES LegalPersons (person_type, person_ID) 
 -- natural person columns and constraints here
);

This superclass-subclass pattern is very common in SQL.

Ideally, a table name should reflect the nature of the set as a whole. You many need to think beyond a compound of other sets' names; perhaps ask a expert in the particular field of business e.g. an accountant may use the term 'payroll' rather than 'EmployeesSalaries'.

Another ideal is for a column's name to remain the same throughout the schema but with a subclassing approach you often need to qualify them (and this bothers me!) e.g.

CREATE TABLE CompanyAgents
(
 company_person_ID INTEGER NOT NULL UNIQUE, 
 company_person_type VARCHAR(14) NOT NULL
    CHECK (company_person_type = 'Company'), 
 FOREIGN KEY (company_person_type, company_person_ID)
    REFERENCES LegalPersons (person_type, person_ID), 
 agent_person_ID INTEGER NOT NULL, 
 agent_person_type VARCHAR(14) NOT NULL, 
 FOREIGN KEY (agent_person_type, agent_person_ID)
    REFERENCES LegalPersons (person_type, person_ID), 
 CHECK (company_person_ID <> agent_person_ID)
);

Note I would have used a single column key for agent_person_ID e.g.

 agent_person_ID INTEGER NOT NULL
    REFERENCES LegalPersons (person_ID)

because there is no restriction on entity type. In principle I feel better about retaining the two-column compound key for all references throughout the schema and I find in practice as often as not I need to now the entity type anyhow so this SQL DDL is saving a JOIN in SQL DML :)

回复收藏 0 原文

骄兵必败 2024-10-02 08:17:19

查找“泛化专业化关系建模”。

我认为有一种实体，我称之为“法人”。你所说的“人”，有些人可能称之为“自然人”，是一种特殊的法人。你所说的“公司”，有些人可能称之为“法人”，是一种不同类型的专门法人。

这样看来，“法人”、“人”和“公司”之间的关系可以被视为一种“gen-spec”（广义-专业化）模式。 gen-spec 在对象建模教程中得到了很多处理，并且非常自然地符合继承的概念。 gen-spec 在关系建模教程中经常被掩盖。但这个概念很好理解。

法人可以拥有一家公司，无论法人属于哪种专业类型。

您的 ENTITYALL 表符合 gen-spec 关系设计的一些功能，但您可以进一步开发该模型。特别是，如果我们有一个实体类型“automobile”，则没有特殊原因无法在 ENTITYALL 表中获得条目。但汽车不能拥有一家公司的事实现在已被掩盖。我想要某种将“人”和“公司”概括为“法人”的表格，但又不那么笼统以至于将“汽车”归类为“法人”。 ENTITYALL 对于我的喜好来说太通用了。

看看 gen-spec 的最佳示例，我们发现 gen 表的主键和每个 spec 表的主键都来自同一域。此外，除了维护专用实体的实体完整性之外，spec 表的主键还充当对 gen 表的外键引用。事实证明连接非常好。您的模式可以从这个设计技巧中受益。

回复收藏 0 原文