为什么我读到这么多关于使用复合键的负面意见?
我正在开发一个喜欢自动编号标识符的 Access 数据库。每个表都使用它们,除了一个表使用由一个人的名字、姓氏和出生日期组成的键。不管怎样,人们开始遇到很多重复问题,因为表示关系的表可能会两次或更多次保存相同的关系。我决定通过为关系表实现复合键来解决这个问题,从那以后我就没有遇到重复的问题。
所以我想知道 Access 世界中复合键的坏名声是怎么回事?我想编写查询会稍微困难一些,但至少您不必每次在前端输入数据甚至编辑数据时都进行大量检查。他们的效率是不是极其低效或者什么?
I was working on an Access database which loved auto-numbered identifiers. Every table used them except one, which used a key made up of the first name, last name and birthdate of a person. Anyways, people started running into a lot of problems with duplicates, as tables representing relationships could hold the same relationship twice or more. I decided to get around this by implementing composite keys for the relationship tables and I haven't had a problem with duplicates since.
So I was wondering what's the deal with the bad rep of composite keys in the Access world? I guess it's slightly more difficult to write a query, but at least you don't have to put in place tons of checks every time data is entered or even edited in the front end. Are they incredibly super inefficient or something?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
复合键对于单个表来说效果很好,但是当您开始在表之间创建关系时,它可能会变得有点多。
考虑两个表
Person
和Event
,以及它们之间称为Appointment
的多对多关系。如果
Person
表中有一个由名字、姓氏和出生日期组成的复合键,并且在Event
表中有一个由地点和位置组成的复合键name,您将在Appointment
表中获得五个字段来标识关系。绑定关系的条件将相当长:
另一方面,如果您有
Person
和Event
表的自动编号键,则只需要Appointment
表来标识关系,并且条件要小很多:A composite key works fine for a single table, but when you start to create relations between tables it can get a bit much.
Consider two tables
Person
andEvent
, and a many-to-many relations between them calledAppointment
.If you have a composite key in the
Person
table made up of the first name, last name and birth date, and a compossite key in theEvent
table made up of place and name, you will get five fields in theAppointment
table to identify the relation.A condition to bind the relation will be quite long:
If you on the other hand have auto-numbered keys for the
Person
andEvent
tables, you only need two fields in theAppointment
table to identify the relation, and the condition is a lot smaller:如果您仅使用纯自写的
SQL
来访问您的数据,那么它们是可以的。然而,一些
ORM
、适配器等需要有一个PK
字段来标识记录。另请注意,复合主键几乎总是自然键(创建代理复合键几乎没有意义,您也可以使用单字段复合主键)。
复合主键最常见的用法是多对多链接表。
使用自然键时,应确保它们本质上是唯一的且不可变,即实体一旦被映射后,始终由相同的键值来标识。模型,任何值都只能标识一个实体。
你的情况并非如此。
首先,一个人可以改变自己的名字,甚至生日。
其次,我可以很容易地想象两个
John Smiths
在同一天出生。前者意味着,如果一个人更改了姓名,则必须在每个引用
persons
的表中进行更新;后者意味着第二个John Smith
将无法将其添加到您的数据库中。对于像您这样的情况,我真的会考虑向您的模型添加代理标识符。
If you only use pure self-written
SQL
to access your data, they are OK.However, some
ORM
s, adapters etc. require having a singlePK
field to identify a record.Also note that a composite primary key is almost invariably a natural key (there is hardly a point in creating a surrogate composite key, you can as well use a single-field one).
The most common usage of a composite primary key is a many-to-many link table.
When using the natural keys, you should ensure they are inherently unique and immutable, that is an entity is always identified by the same value of the key, once been reflected by the model, and only one entity can be identified by any value.
This it not so in your case.
First, a person can change their name and even the birthdate
Second, I can easily imagine two
John Smiths
born at the same day.The former means that if a person changes their name, you will have to update it in each and every table that refers to
persons
; the latter means that the secondJohn Smith
will not be able to make it into your database.For the case like yours, I would really consider adding a surrogate identifier to your model.
不幸的是,这些负面意见的原因之一可能是无知。太多人没有正确理解候选键的概念。有些人似乎认为每个表只需要一个密钥,一个密钥足以保证数据完整性,并且选择一个密钥才是最重要的。
我经常推测,完全弃用并逐步淘汰“主键”一词的使用将是一件好事。这样做将使数据库设计者集中精力于真正的问题:表应该具有尽可能多的键以确保数据的正确性,并且其中一些键可能是复合的。废除主键概念将消除所有关于主键应该是什么或不应该是什么的愚蠢争论。
Unfortunately one reason for those negative opinions is probably ignorance. Too many people don't understand the concept of Candidate Keys properly. There are people who seem to think that every table needs only one key, that one key is sufficient for data integrity and that choosing that one key is all that matters.
I have often speculated that it would be a good thing to deprecate and phase out the use of the term "primary key" altogether. Doing that would focus database designers minds on the real issue: that a table should have as many keys as are necessary to ensure the correctness of the data and that some of those keys will probably be composite. Abolishing the primary key concept would do away with all those fatuous debates about what the primary key ought to be or not be.
如果您的 RDBMS 支持它们并且您正确(且一致)地使用它们,则复合 PK 上的唯一键应该足以避免重复。至少在 SQL Server 中,您还可以针对唯一键而不是 PK 创建 FK,这可能很有用。
单个“id”列(或代理键)的优点是它可以通过缩小键来提高性能。由于此键可以作为 FK 列携带到该表上的索引(作为从索引行返回到物理行的指针)和其他表,这可以减少空间并提高性能。不过,这在很大程度上取决于 RDBMS 的特定架构。不幸的是,我对 Access 不太熟悉,无法对此发表评论。
正如 Quassnoi 指出的,一些 ORM(以及其他第三方应用程序、ETL 解决方案等)不具备处理复合键的能力。不过,除了一些 ORM 之外,大多数最新的第三方应用程序都将支持复合键。不过,总体而言,ORM 在采用这一点方面要慢一些。
我个人对组合键的偏好是,虽然唯一索引可以解决重复的问题,但我还没有看到真正完全使用它们的开发商店。大多数开发人员都对此感到懒惰。他们添加一个自动递增的 ID 并继续前进。然后,六个月后,他们付给我很多钱来解决他们的重复数据问题。
另一个问题是自动递增 ID 通常不可移植。当然,您可以在系统之间移动它们,但由于它们在现实世界中没有实际基础,因此不可能根据实体的其他所有信息来确定它们。这在 ETL 中成为一件大事。
主键在数据建模领域是非常重要的事情,如果您希望数据一致且干净,那么它们通常值得更多考虑,“添加自动递增 ID”。
代理键也很有用,但当我尝试处理已知的性能问题时,我更喜欢使用它们。否则,这就是浪费时间去解决你可能根本没有的问题的典型问题。
最后一点……关于交叉引用表(或一些人所说的连接表),除非 ORM 需要,否则添加代理键有点愚蠢(在我看来)。
If your RDBMS supports them and if you use them correctly (and consistently), unique keys on the composite PK should be sufficient to avoid duplicates. In SQL Server at least, you can also create FKs against a unique key instead of the PK, which can be useful.
The advantage of a single "id" column (or surrogate key) is that it can improve performance by making for a narrower key. Since this key may be carried to indexes on that table (as a pointer back to the physical row from the index row) and other tables as a FK column that can decrease space and improve performance. A lot of it depends on the specific architecture of your RDBMS though. I'm not familiar enough with Access to comment on that unfortunately.
As Quassnoi points out, some ORMs (and other third party applications, ETL solutions, etc.) don't have the capability to handle composite keys. Other than some ORMs though, most recent third party apps worth anything will support composite keys though. ORMs have been a little slower in adopting that in general though.
My personal preference for composite keys is that although a unique index can solve the problem of duplicates, I've yet to see a development shop that actually fully used them. Most developers get lazy about it. They throw on an auto-incrementing ID and move on. Then, six months down the road they pay me a lot of money to fix their duplicate data issues.
Another issue, is that auto-incrementing IDs aren't generally portable. Sure, you can move them around between systems, but since they have no actual basis in the real world it's impossible to determine one given everything else about an entity. This becomes a big deal in ETL.
PKs are a pretty important thing in the data modeling world and they generally deserve more thought then, "add an auto-incrementing ID" if you want your data to be consistent and clean.
Surrogate keys are also useful, but I prefer to use them when I have a known performance issue that I'm trying to deal with. Otherwise it's the classic problem of wasting time trying to solve a problem that you might not even have.
One last note... on cross-reference tables (or joining tables as some call them) it's a little silly (in my opinion) to add a surrogate key unless required by an ORM.
复合键不仅是复合主键,也是复合外键。我这么说是什么意思?我的意思是,引用回原始表的每个表都需要复合键中每一列的一列。
这是一个简单的示例,使用通用的学生/班级安排。
人
名字
姓氏
地址
类别
类名
讲师名字
讲师姓氏
导师地址
MeetingTime
StudentClass - 多对多连接表
学生名字
学生姓
学生地址
类名
讲师名字
讲师姓氏
导师地址
MeetingTime
您刚刚从使用代理键的 2 列多对多表转变为使用复合键的 8 列多对多表,因为它们具有 3 列和 5 列外键。您无法真正删除任何这些字段,因为这样记录就不会是唯一的,因为学生和教师都可能有重复的姓名。哎呀,如果你有两个来自同一地址、同名的人,你仍然有严重的麻烦。
Composite Keys are not just composite primary keys, but composite foreign keys as well. What do I mean by that? I mean that each table that refers back to the original table needs a column for each column in the composite key.
Here's a simple example, using a generic student/class arrangement.
Person
FirstName
LastName
Address
Class
ClassName
InstructorFirstName
InstructorLastName
InstructorAddress
MeetingTime
StudentClass - a many to many join table
StudentFirstName
StudentLastName
StudentAddress
ClassName
InstructorFirstName
InstructorLastName
InstructorAddress
MeetingTime
You just went from having a 2-column many-to-many table using surrogate keys to having an 8-column many-to-many table using composite keys, because they have 3 and 5 column foreign keys. You can't really get rid of any of these fields, because then the records wouldn't be unique, since both students and instructors can have duplicate names. Heck, if you have two people from the same address with the same name, you're still in serious trouble.
在我看来,这里给出的大多数答案并不是由定期使用 Access 的人给出的,因此我将从这个角度插话(尽管我会重复其他一些人所说的内容,只是带有一些特定于 Access 的注释)。
仅当没有单列候选键时,我才使用代理键。这意味着我有带有代理 PK 和单列自然 PK 的表,但没有复合键(除了在连接中,它们是两个 FK 的组合,代理或自然并不重要)。
Jet/ACE集群在PK上,并且仅在PK上。这具有潜在的缺点和潜在的好处(例如,如果您将随机自动编号视为 PK)。
根据我的经验,复合 PK 的非空要求使得大多数自然键在不使用可能有问题的默认值的情况下变得不可能。它同样会破坏您在 Jet/ACE 中的唯一索引,因此在 Access 应用程序中(2010 年之前),您最终会在应用程序中强制执行唯一性。从 A2010 开始,可以使用表级数据宏(其工作方式类似于触发器)将该逻辑移至数据库引擎中。
复合键可以帮助您避免联接,因为它们会重复使用代理键您必须通过联接从源表获取的数据。虽然连接可能很昂贵,但主要是外部连接会消耗性能,并且只有使用非必需的 FK 时,您才能获得避免外部连接的全部好处。但是这么多的数据重复总是让我很困扰,因为它似乎违背了我们所学的关于规范化的一切!
正如我上面提到的,我的应用程序中唯一的复合键位于 N:N 连接表中。我永远不会向连接表添加代理键除非在相对罕见的情况下,连接表本身就是相关表的父级(例如,个人/公司 N:N 记录可能具有相关JobTitles,即同一公司内的多个职位)。您无需将复合键存储在子表中,而是存储代理键。不过,我可能不会将代理键作为 PK——我会将复合 PK 保留在一对 FK 值上。我只需添加一个带有唯一索引的自动编号,用于连接到子表。
我想到的再补充。
Most of the answers given here don't seem to me to be given by people who work with Access on a regular basis, so I'll chime in from that perspective (though I'll be repeating what some of the others have said, just with some Access-specific comments).
I use surrogate a key only when there is no single-column candidate key. This means I have tables with surrogate PKs and with single-column natural PKs, but no composite keys (except in joins, where they are the composite of two FKs, surrogate or natural doesn't matter).
Jet/ACE clusters on the PK, and only on the PK. This has potential drawbacks and potential benefits (if you consider a random Autonumber as PK, for instance).
In my experience, the non-Null requirement for a composite PK makes most natural keys impossible without using potentially problematic default values. It likewise wrecks your unique index in Jet/ACE, so in an Access app (before 2010), you end up enforcing uniqueness in your application. Starting with A2010, table-level data macros (which work like triggers) can conceivably be used to move that logic into the database engine.
Composite keys can help you avoid joins, because they repeat data that with surrogate keys you'd have to get from the source table via a join. While joins can be expensive, it's mostly outer joins that are a performance drain, and it's only with non-required FKs that you'd get the full benefit of avoiding outer joins. But that much data repetition has always bothered me a lot, since it seems to go against everything we've ever been taught about normalization!
As I mentioned above, the only composite keys in my apps are in N:N join tables. I would never add a surrogate key to a join table except in the relatively rare case in which the join table is itself a parent to a related tables (e.g., Person/Company N:N record might have related JobTitles, i.e., multiple jobs within the same company). Rather than store the composite key in the child table, you'd store the surrogate key. I'd likely not make the surrogate key the PK, though -- I'd keep the composite PK on the pair of FK values. I would just add an Autonumber with a unique index for joining to the child table(s).
I'll add more as I think of it.
它使查询和维护变得复杂。如果您真的对这个主题感兴趣,我建议您查看已经涵盖此主题的帖子数量。这将为您提供比此处任何回复更好的信息。
https://stackoverflow.com/search?q=composite+primary+key
It complicates queries and maintenance. If you are really interested in this subject I'd recommend looking over the number of posts that already cover this. This will give you better info than any one response here.
https://stackoverflow.com/search?q=composite+primary+key
首先,复合键不利于连接的性能。此外,它们对于更新记录来说要糟糕得多,因为您还必须更新所有子记录。最后,很少有复合键实际上是真正好的键。要成为一个好的密钥,它应该是唯一的并且不能改变。您提供的作为复合键使用的示例未通过这两项测试。它不是唯一的(有些人在同一天出生同名),并且姓名经常更改,导致所有子表进行不必要的更新。
至于具有自动生成键的表导致重复,这主要是由于以下几个因素造成的:
无法以任何方式识别为
独特的
对可能的情况创建唯一索引
复合键
它并不试图找到
匹配记录或允许数据
当下拉可能更多时进入
合适的。
这些都不是代理键的错,它们只是表明开发人员不称职。
In the first place composite keys are bad for performance in joins. Further they are much worse for updating records as you have to update all the child records as well. Finally very few composite keys are actually really good keys. To be a good key it should be unique and not be subject to change. The example you gave as a composite key you used fails both tests. It is not unique (there are people with the same name born on the same day) and names change frequently causing much unnecessary updating of all the child tables.
As far as table with autogenrated keys casuing duplicates, that is mostly due to several factors:
can't be identified in any way as
unique
create a unique index on the possible
composite key
which doesn't attempt to find
matching records or which allows data
entry when a pull down might be more
appropriate.
None of those are the fault of the surrogate key, they just indicate incompetent developers.
我认为有些程序员看到了复杂性,但想避免它,而大多数程序员甚至根本不想去寻找复杂性。
让我们考虑一个具有多个候选键的表的常见示例:一个包含列
employee_number
、salary_amount
、start_date 的
和Payroll
表end_date
。四个候选密钥如下:
仅需要强制执行这些密钥之一,即排序密钥。然而,大多数编码员不会想到添加这样一个密钥,更不用说知道如何对其进行编码了。事实上,我敢打赌大多数 Access 编码员都会向表中添加一个递增的自动编号列,使自动编号列成为
主键
,无法为任何候选键添加约束,并且会说服自己他们的桌子有钥匙!I think some coders see the complexity but want to avoid it, and most coders don't even think to look for the complexity at all.
Let's consider a common example of a table that had more than one candidate key: a
Payroll
table with columnsemployee_number
,salary_amount
,start_date
andend_date
.The four candidate keys are as follows:
Only one of those keys are required to be enforced i.e. the sequenced key. However, most coders wouldn't think to add such a key, let alone know how to code it in the first place. In fact, I would wager that most Access coders would add an incrementing autonumber column to the table, make the autonumber column the
PRIMARY KEY
, fail to add constraints for any of the candidate keys and will have convinced themselves that their table has a key!我开始使用 Access,并且对于主键的使用也有过同样的争论。我目前所做的就是两者的结合。如果我需要一个复合键,我会创建一个并将其标记为主键...但我还会创建一个唯一值的单个 ID 字段,将其用作参考。 ID 用作其他表的外部引用,因为它本质上是唯一的,因为组合键也需要是唯一的。要引用主(复合)键,可以通过引用来完成。
创建记录时需要满足复合密钥要求,并且按照唯一的规则生成 ID 密钥。
I started using Access and have had the same debate about the use of primary keys. What I am currently doing is a combination of both. If I need a composite key, I make one and label it as the primary key...but I also create a unique-valued single ID field that I use as a reference. The ID is used as the foreign reference for other tables since it is inherently unique because the composite key also needs to be unique. To refer to the primary (composite) key it can be done by reference.
The composite key requirements need to be satisfied upon record creation and the ID key gets generated following the rule that it needs to be unique.