候选键有什么意义?
我对数据库管理相当陌生,这个问题似乎永远不能用一句话来回答。所有其他 SO 答案都说“候选密钥是最小超级密钥”。这对我来说毫无意义。
候选键应该指定数据库记录的唯一性,对吗?主键是候选键。如果主键已经指定了唯一性,那么添加更多候选键有什么意义呢?
我见过如下示例记录:
Employee(ID, Name, PhoneNumber)
其中 ID 是主键,PhoneNumber 是候选键。据我所知,ID 足以指定员工记录的唯一性。尽管 PhoneNumbers(可能)是唯一的,但将它们指定为候选键对我来说似乎并不是“最小的”。
I'm fairly new to database management and this question never seems to be answered in more than one sentence. All other SO answers say "A candidate key is a minimal super key." That means nothing to me.
A candidate key is supposed to specify uniqueness of a db record, correct? And a primary key is a candidate key. If a primary key already specifies uniqueness, what's point of adding more candidate keys?
I have seen example records like the following:
Employee(ID, Name, PhoneNumber)
where ID is the primary key and PhoneNumber is a candidate key. From what I see, the ID is enough to specify the uniqueness of an employee record. Although PhoneNumbers are (probably) unique, specifying them as a candidate key does not seem "minimal" to me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
这意味着如果 PhoneNumber 确实是候选键,您可以删除 ID 列并使用 PhoneNumber 代替。换句话说,它是唯一密钥的候选者。
维基百科有一个更正式定义,很多人都想看看。
It means that if PhoneNumber was indeed a candidate key you could delete the ID column and use PhoneNumber instead. In other words, it is a candidate for being a unique key.
Wikipedia has a more formal definition that you many want to look at.
密钥称为候选密钥,因为虽然它可以用作 PK,但它不一定是 PK。
给定行可以有多个候选键,例如
EmployeeID
和SSN
。通常,不是使用候选键作为 PK,而是创建代理键 这是因为关于使用哪个候选键的决定稍后可能会被发现是错误的,这可能会导致非常头痛(字面意思)。
另一个原因是可以使用高效的数据类型来创建代理键以用于索引目的,而候选键可能没有(例如,UserImage)。
第三个原因是许多ORM仅适用于单列PK< /strong>,因此在这种情况下排除由多列(复合键)组成的候选键。
许多开发人员没有意识到,在自然键上选择代理键可能会在数据完整性方面做出妥协。通过选择代理键,您可能会失去对数据的一些约束,并且如果选择代理键,通常需要触发器来模拟约束。
A key is called a candidate key, because while it could be used as a PK, it is not necessarily the PK.
There can be more than one candidate key for a given row, e.g.,
EmployeeID
andSSN
.Often, rather than using a candidate key as the PK, a surrogate key is created instead. This is because decisions around what candidate key to use can be found to be erroneous later, which can cause a huge headache (literally).
Another reason is that a surrogate key can be created using an efficient data type for indexing purposes, which the candidate keys may not have (e.g., a
UserImage
).A third reason is that many ORMs work only with a single-column PK, so candidate keys composed of more than one column (composite keys) are ruled out in that case.
Something that many developers do not realize is that selecting a surrogate key over a natural key may be a compromise in terms of data integrity. You may be losing some constraints on your data by selecting a surrogate key, and often a trigger is required to simulate the constraint if a surrogate key is chosen.
候选键是在设计数据库系统时出现的一个概念。
假设您的系统将有一个名为 User 的表,定义如下:
那么,您必须选择这些列的哪个子集将作为您的主键。
当然,设计数据库时的目标是使该集合保持最小。你
如果仅 SSN 已保证唯一性,则不会使用该对(SSN、信用卡号码)。
现在,假设fullName、SSN 和creditCardNumber 都是您以某种方式知道的字段,
对于所有用户来说都是唯一的。你可以使用其中任何一个作为你的 PK,所以它们是
所有候选键(而年龄则不是)。
你会选择哪个?
这将取决于字段的数据类型等因素(最好是
例如,在整数列而不是 varchar 列上设置索引)。
Candidate key is a concept that appears when you are designing a database system.
Suppose your system will have a table named User, defined as below:
Well, you have to choose which subset of these columns will be your primary key.
The goal when designing the database is, of course, to keep this set minimal. You
wouldn't use the pair (SSN, creditCardNumber) if the SSN alone already guarantees uniqueness.
Now, suppose that fullName, SSN and creditCardNumber are all fields that you know, somehow,
that are unique for all users. You could use any of those as your PK, so they are
all candidate keys (whereas age, on the other hand, is not).
Which will you choose?
That will depend on factors such as the datatype of the field (it's preferable to
set an index on an integer column rather than on an varchar column, for example).
候选键与主键的含义完全相同。所有候选键都有相同的目的:通过防止重复数据来确保数据完整性。如果有必要确保某些数据不重复,那么在表中强制执行多个候选键是完全明智的。
请注意,本文中的最小并不意味着最小。就是不可约的意思。即,在仍保持其唯一性的同时,不能从键中删除任何属性。
Candidate key means exactly the same thing as primary key. All candidate keys serve the same purpose: to ensure data integrity by preventing duplicate data. It's perfectly sensible for a table to have more than one candidate key enforced in it if that's necessary to ensure some data is not duplicated.
Note that minimal in this context does NOT mean smallest. It means irreducible. i.e. no attribute can be removed from the key while still maintaining its uniqueness.
我们使用候选键,因为有时需要处理记录和其他问题,例如银行系统的数据库,其中Account_No是主键,SocialSecurity_NO将是候选键,我们将SocialSecurity_NO保留为唯一键,因为如果政府在提供 SocialSecurity_NO 时出现错误,那么这将是一个大问题,因此我们已经将其声明为唯一密钥,因此具有相同 SocialSecurity_NO 的两个用户不可能能够获得帐户...
主键- ----------候选键---------attribute3-----attribute4
Account_No SocialSecurity_NO
We use candidate key, because some times it is necessary to take care of the record and other issue, for example a data base for bank system, where Account_No is primary key and SocialSecurity_NO will be candidate key, we keep the SocialSecurity_NO as UNIQUE key, because if government got a mistake in providing the SocialSecurity_NO, then it will be big problem, so we have already declared it as UNIQUE key, so there is no possibility that two user with same SocialSecurity_NO will be able for account...
primary key----------- candidate key---------attribute3-------attribute4
Account_No SocialSecurity_NO
在 RM(关系模型)中: 超级键是唯一的列集。 CK(候选密钥)是不包含更小的超级密钥的超级密钥。它是 PK(主键)的候选者。 PK 只是您决定称之为 PK 的某个 CK。其他 CK 是 AK(备用密钥)。
SQL PK 是 UNIQUE NOT NULL,不一定是 CK/PK。因此,我们可以说 SQL CK 是 UNIQUE NOT NULL,是声明为 SQL PK 的候选者,并且我们可以将其他 SQL AK 称为 SQL AK。在 SQL 环境中,当人们在 RM 和 SQL 中使用这些具有不同含义的术语时,您必须询问他们的意思是什么。 SQL。
In the RM (relational model): A superkey is a unique column set. A CK (candidate key) is a superkey containing no smaller superkey. It is a candidate for being a PK (primary key). A PK is just some CK you decided to call the PK. The other CKs are AKs (alternate keys).
A SQL PK is a UNIQUE NOT NULL, not necessarily a CK/PK. So we can say an SQL CK is UNIQUE NOT NULL, a candidate for being declared as SQL PK, and we can call the others SQL AKs. In an SQL context you have to ask people what they mean when they use these terms with different meanings in the RM & SQL.
“候选人钥匙”是候选人,其含义与总统候选人是候选人完全相同:它们是呈现给您的选择,您必须从中选择一个。它们是主键的候选者。选择一个。在实践中,并非所有选择都一定同样好。
A "candidate key" is a candidate in exactly the same sense that candidates for President are candidates: They are the choices presented to you, from which you must choose one. They are candidates for the primary key. Pick one. In practice, not all choices are necessarily equally good.
候选键通常指那些可能被选为自然主键的列。然而,自然主键通常也不是一个好主意,因为它们是唯一的,但并非一成不变(想想因为公司名称更改而更改数百万个子行所造成的严重破坏),或者因为它们在连接方面的效率低于代理键。此外,在现实生活中,许多潜在的候选键对于真正的 PK 来说不够稳定,并且不像我们想象的那么独特。例如,电子邮件可以在帐户关闭后重新使用。
您可以添加其他索引来指定唯一性,以维护数据完整性。例如,在您的示例中,您使用代理键来确保记录的唯一性。但这并不能确保人/电话组合仅输入一次。因此,如果您使用代理键,您可能希望为数据中任何可能的候选键(这可以由一个或多个字段组成)创建一个唯一索引。这确保了需要唯一的项目,并允许您获得代理键的性能优势。
此外,有时存在多个字段或字段组合,这些字段或字段组合应该是唯一的。例如,假设您有一个存储组织结构的表。每个组织实体都应该是唯一的,每个组织/个人组合也应该是唯一的(假设没有工作共享)。通过在这些字段上放置唯一索引,您可以防止将错误数据插入到表中。
Candidate keys usually refer to those columns which could potentially be selected as the natural primary key. However, natural primary keys are also often a bad idea because they are unique but not unchanging (think of the havoc of changing millions of child rows because a company name changed) or because they are less efficient in joins than surrogate keys. Further, in real life, many potential candidate keys are not nearly stable enough for a true PK and are not as unique as we think. Emails, for instance, can be reused after an account is closed.
You add other indexes to specify uniqueness in order to maintain data integrity. For instance in your example, you are using a surrogate key to ensure uniqueness of a record. But this does not ensure that the person/phone combination is entered only once. So you would want to create a unique index for any possible candidate key (and this can consist of one or more fields) in the data if you are using a surrogate key. This ensures that the items which need to be unique are and allows you to get the performance benefits of a surrogate key.
Further sometimes there are mulitple fields or combinations of fields which should be unique. For instance suppose you have a table storing organization structure. Each organizational entity should be unique, so should each organization/person combination (assuming that there is no job sharing). By placing unique indexes on these fields, you are able to prevent bad data from being inserted into the table.
简而言之:候选密钥是一个最小的超级密钥。
其中超级键是唯一标识 RDBMS 中关系(表)中的任何记录(或元组)的列(或属性)的组合。
例如,考虑具有 A、B、C 和 D 列的表中的以下依赖关系(给出此表只是为了快速示例,因此不涵盖 R 可能具有的所有依赖关系)。
属性集(决定因素)---可以识别--->(从属)
A-----> AD
B-----> ABCD
C-----> CD
AC----->ACD
AB----->ABCD
ABC----->ABCD
BCD----->ABCD
现在,B、AB、ABC、BCD 标识所有列,因此这四人有资格获得超级钥匙。
但是,B⊂AB; B⊂ABC; B⊂BCD 因此 AB、ABC 和 BCD 不符合候选键的资格,因为它们的子集可以识别关系,因此它们不是最小的,因此只有 B 是候选键,而不是其他键。
谢谢你的询问。
In nutshell: CANDIDATE KEY is a minimal SUPER KEY.
Where Super key is the combination of columns(or attributes) that uniquely identify any record(or tuple) in a relation(table) in RDBMS.
For instance, consider the following dependencies in a table having columns A, B, C and D (Giving this table just for a quick example so not covering all dependencies that R could have).
Attribute set (Determinant)---Can Identify--->(Dependent)
A-----> AD
B-----> ABCD
C-----> CD
AC----->ACD
AB----->ABCD
ABC----->ABCD
BCD----->ABCD
Now, B, AB, ABC, BCD identifies all columns so those four qualify for the super key.
But, B⊂AB; B⊂ABC; B⊂BCD hence AB, ABC, and BCD disqualified for CANDIDATE KEY as their subsets could identify the relation, so they aren't minimal and hence only B is the candidate key, not the others.
Thanks for asking.