复合主键
我正在设计一个数据库,该数据库将用于存储来自多个不同来源的数据。我存储的实例由原始来源分配了唯一的 ID。我存储的每个实例都应包含有关其来源的信息,以及与该来源关联的 ID。
例如,请考虑下表来说明该问题:
----------------------------------------------------------------
| source_id | id_on_source | data |
----------------------------------------------------------------
| 1 | 17600 | ... |
| 1 | 17601 | ... |
| 2 | 1 | ... |
| 3 | 1 | ... |
----------------------------------------------------------------
请注意,虽然每个源的 id_on_source
都是唯一的,但对于不同的源也可能会找到相同的 id_on_source
。来源。
我对关系数据库有相当的了解,但距离专家甚至经验丰富的用户还很远。我在这个设计中面临的问题是我应该使用什么作为主键。数据似乎规定了复合主键 (source_id, id_on_source)
的使用。经过一番谷歌搜索后,我发现了一些关于复合主键利弊的激烈争论,这让我有点困惑。
该表将与其他表具有一对多关系,因此将在其他表的外键中引用。
我没有依赖于特定的 RDBMS,并且我不确定它对于论证是否重要,但假设我更喜欢使用 SQLite 和 >MySQL
。
在这种情况下使用复合外键有哪些优点和缺点?您更喜欢哪一个?
I am working on the design of a database that will be used to store data that originates from a number of different sources. The instances I am storing are assigned unique IDs by the original sources. Each instance I store should contain information about the source it came from, along with the ID it was associated by this source.
As an example, consider the following table that illustrates the problem:
----------------------------------------------------------------
| source_id | id_on_source | data |
----------------------------------------------------------------
| 1 | 17600 | ... |
| 1 | 17601 | ... |
| 2 | 1 | ... |
| 3 | 1 | ... |
----------------------------------------------------------------
Note that while the id_on_source
is unique for each source, it is possible for the same id_on_source
to be found for different sources.
I have a decent understanding of relational databases, but am far from an expert or even an experienced user. The problem I face with this design is what I should use as primary key. The data seems to dictate the use of a composite primary key of (source_id, id_on_source)
. After a little googling I found some heated debates on the pros and cons of composite primary keys however, leaving me a little confused.
The table will have one-to-many relationship with other tables, and will thus be referred to in the foreign keys of other tables.
I am not tied to a specific RDBMS
and I am not sure if it matters for the sake of the argument, but let's say that I prefer to work with SQLite
and MySQL
.
What are the pros and cons of using a composite foreign key in this case? Which would you prefer?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我个人认为复合主键很痛苦。对于您希望加入“源”表的每个表,您需要添加 source_id 和 id_on_source 字段。
我将在源表上创建一个标准的自动递增主键,并在 source_id 和 id_on_source 列上添加唯一索引。
然后,您可以仅添加源表的 id 作为其他表的外键。
一般来说,我还发现许多框架和工具产品中对复合主键的支持充其量是“零散的”,而在其他框架和工具产品中则不存在
I personally find composite primary keys to be painful. For every table that you wish to join to your "sources" table you will need to add both the source_id and id_on_source field.
I would create a standard auto-incrementing primary key on your sources table and add a unique index on source_id and id_on_source columns.
This then allows you to add just the id of the sources table as a foreign key on other tables.
Generally I have also found support for composite primary keys within many frameworks and tooling products to be "patchy" at best and non-existent in others
复合键难以管理且连接速度慢。由于您正在构建汇总表,因此请使用代理键(即自动增量/标识列)。将您的自然键列留在那里。
这还有很多其他好处。首先,如果您与一家公司合并,并且他们具有相同的来源之一,但重复使用密钥,那么如果您不使用代理密钥,您就会遇到麻烦。
这是数据仓库中广泛认可的最佳实践(比您正在做的事情要大得多,但仍然相关),并且有充分的理由。代理提供数据完整性和快速连接。使用自然键很快就会被烧毁,因此请远离它们作为标识符,并且仅在导入过程中使用它们。
Composite keys are tough to manage and slow to join. Since you're building a summary table, use a surrogate key (i.e.-an autoincrement/identity column). Leave your natural key columns there.
This has a lot of other benefits, too. Primarily, if you merge with a company and they have one of the same sources, but reused keys, you're going to get into trouble if you aren't using a surrogate key.
This is the widely acknowledged best practice in data warehousing (a much larger undertaking than what you're doing, but still relevant), and for good reason. Surrogates provide data integrity and quick joins. You can get burned very quickly with natural keys, so stay away from them as an identifier, and only use them on the import process.
您的业务要求是这两个属性的组合是唯一的。因此,您应该对这两个属性有一个
UNIQUE
约束。无论您将UNIQUE
约束称为“主要”约束,实际上都只是一种偏好,除了文档之外,它不会产生太大影响。唯一的问题是您是否随后添加一个额外的列并将其标记为
UNIQUE
。我认为这样做的唯一原因是性能,这是一个合理的原因。就我个人而言,我不喜欢将每个数据库本质上转变为图表的方法,其中生成的列本质上是指针,而您只是从一个数据库遍历到下一个数据库。我认为这抛弃了关系系统的所有优点。如果您退后一步思考一下,就会发现您正在引入一堆对您的业务毫无意义的专栏。您可能对我的相关博文感兴趣。
You have a business requirement that the combination of those two attributes are unique. So, you should have a
UNIQUE
constraint on those two attributes. Whether you call thatUNIQUE
constraint "primary" is really just a preference, it doesn't have much impact aside from documentation.The only question is whether you then add an extra column and mark it
UNIQUE
. The only reason I can see to do that is performance, which is a legitimate reason.Personally, I don't like the approach of turning every database into essentially a graph, where the generated columns are essentially pointers and you are just traversing from one to the next. I think that throws away all of the greatness of a relational system. If you step back and think about it, you're introducing a bunch of columns that have no meaning to your business, at all. You may be interested in my related blog post.
我相信组合键创建了一个非常自然和描述性的数据模型。我的经验来自Oracle,我认为创建复合PK时不存在任何技术问题。事实上,任何分析数据字典的人都会立即了解有关该表的一些信息。在您的情况下,很明显每个 source_id 必须具有唯一的 id_on_source 。
自然键的使用经常引起激烈的争论,但从良好的数据模型的角度来看,与我一起工作的人喜欢自然键。
I believe that composite keys create a very natural and descriptive data model. My experience comes from Oracle and I don't think there is any technical issues when creating a composite PK. In fact anyone analysing the data dictionary would immediately understand something about the table. In your case it would be obvious that each source_id must have unique id_on_source.
The use of natural keys often creates a hot debate, but people whom I work with like natural keys from a good data model perspective.
几乎我唯一一次使用复合主键是当该键的高位部分是另一个表的键时。例如,我可能会创建一个主键为 OrderId + LineNumber 的 OrderLineItem 表。由于对 OrderLineItem 表的许多访问都是“order join orderlineitem using (orderid)”或其某种变体,因此这通常很方便。当查看数据库转储时,它还可以轻松地找出哪些行项目连接到什么顺序。
正如其他人所指出的,在大多数其他情况下,复合键是一种痛苦,因为您的连接必须涉及所有部分。输入的次数更多,这意味着出错的可能性更大,查询速度更慢等等。
两部分键还不错;我经常这样做。我不愿意使用三部分密钥。超过三部分,我会说忘记它。
在您的示例中,我怀疑使用复合键几乎没有什么好处。只需发明一个新的序列号,并让源和源密钥为普通属性即可。
Pretty much the only time I use a composite primary key is when the high-order part of the key is the key to another table. For example, I might create an OrderLineItem table with a primary key of OrderId + LineNumber. As many accesses against the OrderLineItem table will be "order join orderlineitem using (orderid)" or some variation of that, this is often handy. It also makes it easy when looking at database dumps to figure out what line items are connected to what order.
As others have noted, composite keys are a pain in most other circumstances because your joins have to involve all the pieces. It's more to type which means more potential for mistakes, queries are slower, etc.
Two-part keys aren't bad; I do those fairly often. I'm reluctant to use a three-part key. More than three-parts, I'd say forget it.
In your example, I suspect there's little to be gained by using the composite key. Just invent a new sequence number and let the source and source key be ordinary attributes.
添加额外的 ID 列将使您必须强制执行两个唯一性约束,而不是一个。
使用额外的 ID 列作为其他引用表中的外键,而不是自然呈现的键,将导致您必须执行更多联接,即在您需要原始 soruce_ID 加上 ID_on_source 以及来自的数据的所有情况下参考表。
Adding an extra ID column will leave you having to enforce TWO uniqueness constraints instead of one.
Using that extra ID column as the foreign key in other referencing tables, instead of the key that presents itself naturally, will cause you to have to do MORE joins, namely in all the cases where you need the original soruce_ID plus ID_on_source along with data from the referencing table.
我在使用大量复合键时遇到了问题,因此我不会推荐它(更多内容见下文),我还发现在尝试回滚用户错误时独立/代理键(而不是自然键)有好处。
问题是,通过一组关系,一个表连接了两个表,其中组合的每一行部分都是相同的(这在第三范式中是合适的 - 父级的两个部分之间的比较)。我在连接表中删除了复合关系的那部分重复项(因此,不是parent1ID、other1ID、parent2ID、other2ID,而是parentID、other1ID、other2ID),但现在该关系无法更新对主键的更改,因为它尝试了每条路线都做了两次,中间失败了。
I ran into problems using a lot of composite keys and so I wouldn't recommend it (more below), I've also found there to be benefits in an independent/surrogate key (rather than natural) when trying to roll back user mistakes.
The problem was that via a set of relations, one table joined two tables where for each row part of the composite was the same (this was appropriate in 3rd normal form - a comparison between two parts of a parent). I de-duplicated that part of the composite relationship in the join table (so instead of parent1ID, other1ID, parent2ID, other2ID there was parentID, other1ID, other2ID) but now the relation couldn't update changes to the primary key, because it tried to do it twice via each route and failed in the middle.
有些人建议您使用全局唯一 ID (GUID):合并复制和事务复制通过更新订阅,使用 uniqueidentifier 列来保证在表的多个副本中唯一标识行。如果该值在创建时是全局唯一的,则无需添加 source_id 来使其唯一。
尽管 uniqueid 是一个很好的主键,但我同意使用不同的自然(不一定是唯一)键作为聚集索引通常更好。例如,如果 uniqueid 是标识员工的 PK,您可能希望将聚集索引作为部门(如果您的 select 语句通常检索给定部门内的所有员工)。如果您确实想使用 unqiqueid 作为聚集索引,请参阅 NEWSEQUENTIALID() 函数:这会创建连续的 uniqueid 值,这些值(连续的)具有更好的集群性能。
Some people recommend you use a Globally Unique ID (GUID): merge replication and transactional replication with updating subscriptions use uniqueidentifier columns to guarantee that rows are uniquely identified across multiple copies of the table. If the value if globally unique when it's created, then you don't need to add the source_id to make it unique.
Although a uniqueid is a good primary key, I agree that it's usually better to use a different, natural (not necessarily unique) key as your clustered index. For example if a uniqueid is the PK which identifies employees, you might want to clustered index to be the department (if your select statements usually retrieve all employees within a given department). If you do want to use a unqiqueid as the clustered index, see the NEWSEQUENTIALID() function: this creates sequential uniqueid values, which (being sequential) have better clustering performance.