Person表主键的最佳选择
您在代表个人(如客户、用户、客户、员工等)的表中选择什么主键?我的第一选择是社会安全号码 (SSN)。然而,由于隐私问题和不同的法规,人们不鼓励使用 SSN。 SSN 可以在人的一生中发生变化,因此这是反对它的另一个原因。
我猜想,精心选择的自然主键的功能之一就是避免重复。我不希望一个人在数据库中注册两次。某些代理或生成的主键无助于避免重复条目。解决这个问题的最佳方法是什么?
保证应用程序中个人实体唯一性的最佳方法是什么?可以通过主键或唯一性约束在数据库级别上处理吗?
What is your choice for primary key in tables that represent a person (like Client, User, Customer, Employee etc.)? My first choice would be an social security number (SSN). However, using SSN has been discouraged because of privacy concerns and different regulations. SSN can change during person lifetime, so that is another reason against it.
I guess that one of the functions of well chosen natural primary key is to avoid duplication. I do not want a person to be registered twice in the database. Some surrogate or generated primary key does not help in avoiding duplicate entries. What is the best way to approach this?
What is the best way to guarantee uniqueness in your application for person entity and can this be handled on database level with primary key or uniqueness constraint?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我不知道您正在使用哪个数据库引擎,但是(至少对于MySQL - 请参阅7.4.1. 让数据尽可能小),使用尽可能短的整数通常被认为最适合性能和内存要求。
我将使用整数
auto_increment
作为该主键。这个想法是:
然后,如果可能和/或必要的话,在另一列(确定唯一性的列)上设置一个
UNIQUE
索引。编辑:以下是您可能感兴趣的其他几个问题/答案:
I don't know which Database engine you are using, but (at least with MySQL -- see 7.4.1. Make Your Data as Small as Possible), using an integer, the shortest possible, is generally considered best for performances and memory requirements.
I would use an integer,
auto_increment
, for that primary key.The idea being :
And, then, set a
UNIQUE
index on an other column -- the one that determines unicity -- if that's possible and/or necessary.Edit: Here are a couple of other questions/answers that might interest you :
如上所述,使用自动增量作为主键。但我不认为这是你真正的问题。
您真正的问题是如何避免重复条目。从理论上讲,这是不可能的——两个人可以在同一天出生、同名、住在同一个家庭,而且其中一方没有社会保险号码。 (其中一个可能是访问该国的外国人)。
然而,全名、出生日期、地址和电话号码的组合通常足以避免重复。请注意,输入的地址可能会有所不同,人们可能有多个电话号码,并且人们可能会选择省略中间名或使用首字母缩写。这取决于避免重复条目的重要性以及您的用户群有多大(以及发生冲突的可能性)。
当然,如果您可以获得 SSN/SIN,则可以使用它来确定唯一性。
As mentioned above, use an auto-increment as your primary key. But I don't believe this is your real question.
Your real question is how to avoid duplicate entries. In theory, there is no way - 2 people could be born on the same day, with the same name, and live in the same household, and not have a social insurance number available for one or the other. (One might be a foreigner visiting the country).
However, the combination of full name, birthdate, address, and telephone number is usually sufficient to avoid duplication. Note that addresses may be entered differently, people may have multiple phone numbers, and people may choose to omit their middle name or use an initial. It depends on how important it is to avoid duplicate entries, and how large is your userbase (and thus the likelihood of a collision).
Of course, if you can get the SSN/SIN then use that to determine uniqueness.
您可以使用哪些属性?您的应用程序关心哪些?例如,没有两个人可以在完全相同的时间、完全相同的地点出生,但您可能无法以这种准确度访问该数据!因此,您需要根据想要建模的属性来决定哪些属性足以提供可接受的数据完整性级别。无论您选择什么,您正确地关注您选择的数据完整性方面(防止为同一个人插入多行)。
对于其他表中的联接/外键,最好使用代理键。
我逐渐认为使用“主键”这个词是用词不当,或者充其量是令人困惑。任何键,无论您将其标记为主键、备用键、唯一键还是唯一索引,都是仍然是一个键,并且要求表中的每一行都包含键中属性的唯一值。从这个意义上说,所有键都是等效的。更重要(最重要)的是它们是自然键(取决于有意义的实域模型数据属性)还是代理项(独立于真实数据属性)
其次,同样重要的是您使用该密钥的目的。 代理键狭窄而简单,并且永远不会改变(没有理由 - 它们没有任何意义),因此它们是连接或其他依赖表中的外键的更好选择。
但是为了确保数据完整性并防止为同一域实体插入多行,它们完全没有用...为此,您需要某种从可用数据中选择的自然密钥,以及您的应用程序出于某种目的对其进行建模。
密钥不必是 100% 不可变的。例如,如果(作为示例)您使用姓名、电话号码和出生日期,即使某人更改了姓名或电话号码,您也只需更改表中的值即可。只要其他行的键属性中没有新值,就可以了。
即使你选择的钥匙只能在 99.9% 的情况下起作用(假设你很不幸遇到两个同名同电话、巧合同一天出生的人),那么,至少 99.9% 的情况将保证数据的准确性和一致性 - 例如,您可以仅在其出生日期中添加时间以使其唯一,或者在密钥中添加一些其他属性以区分它们。只要您不必因为更改而更新整个数据库中外键中的数据值(因为您没有在其他地方使用此键作为 FK),您就不会遇到任何重大问题。
What attributes are available to you? Which ones does your application care about ? For example no two people can be born at exactly the same second at exactly the same place, but you probably don't have access to that data at that level of accuracy! So you need to decide, from the attributes you intend on modeling, which ones are sufficient to provide an acceptable level of data integrity. Whatever you choose, you're right in focusing on the data integrity aspects (preventing insertion of multiple rows for the same person) of your selection.
For Joins/Foreign Keys in other tables, it is best to use a surrogate key.
I've grown to consider the use of the word Primary Key as a misnomer, or at best, confusing. Any key, whether you flag it as Primary Key, Alternate Key, Unique Key, or Unique Index, is still a Key, and requires that every row in the table contain unique values for the attributes in the key. In that sense, all keys are equivilent. What matters more (Most), is whether they are natural keys (dependant on meaningful real- domain model data attributes), or surrogates (Independendant of real data attributes)
Secondly, what also matters is what you use the key for.. Surrogate keys are narrow and simple and never change (No reason to - they don't mean anything) So they are a better choice for joins or for foreign Keys in other dependant tables.
But to ensure data integrity, and prevent insertion of multiple rows for the same domain entity, they are totally useless... For that you need some kind of Natural Key, chosen from the data you have available, and which your application is modeling for some purpose.
The key does not have to be 100% immutable. If (as an example), you use Name and Phone Number and Birthdate, for example, even if a person changes their name, or their phone number, you can simply change the value in the table. As long as no other row already has the new values in their key attributes, you are fine.
Even if the key you select only works in 99.9% of the cases, (say you are unlucky enough to run into two people with the same name and phone number and were coincidentally born the same day), well, at least 99.9% of your data will be guaranteed to be accurate and consistent - and you can for example, just add time to their birthdate to make them unique, or add some other attribute to the key to distinquish them. As long as you don't have to update data values in Foreign Keys throughout your database because of the change, (since you are not using this key as a FK elsewhere) you are not facing any significant issue.
使用自动生成的整数主键,然后对您认为应该唯一的任何内容设置唯一约束。但 SSN 在现实世界中并不是唯一的,因此在此列上设置唯一性约束不是一个好主意,除非您认为因为数据库不接受客户而拒绝客户是一个很好的商业模式。
Use an autogenerated integer primary key, and then put a unique constraint on anything that you believe should be unique. But SSNs are not unique in the real world so it would be a bad idea to put a uniqueness constraint on this column unless you think turning away customers because your database won't accept them is a good business model.
我更喜欢自然键,但表
person
是一个丢失的情况。 SSN 并不是唯一的,也不是每个人都有。I prefer natural keys, but a table
person
is a lost case. SSNs are not unique and not everybody has one.我推荐一个代理键。添加其他候选键所需的所有索引,但我的建议是将业务逻辑排除在键之外。
I'd recommend a surrogate key. Add all the indexes you need for other candidate keys, but keeping business logic out of the key is my recommendation.
我更喜欢自然密钥,因为它们是可信的。
除非您经营一家银行或类似机构,否则您的客户和用户没有理由为您提供有效的 SSN,甚至没有必要拥有一个。因此,出于商业原因,在您概述的案例中,您被迫不信任 SSN。对于任何给定的“人”自然键,类似的论证都成立。
您别无选择,只能分配一个人工(读“代理”)密钥。它也可能是一个整数。确保它是足够大的整数,这样您就不需要很快真正扩展它。
I prefer natural keys, when they can be trusted.
Unless you are running a bank or something like that, there is no reason for your clients and users to provide you with a valid SSN, or even necessarily to have one. Thus, for business reasons, you are forced to distrust SSN in the case you outline. A similar argumant would hold for any given natural key to "persons".
You have no choice but to assign an artificial (Read "surrogate") key. It might as well be an integer. Make sure it's big enough integer so you aren't going to need toexpand it real soon.
要添加到 @Mark 和 @Pascal(自动增量整数是最好的选择)——SSN 很有用,应该正确建模。安全问题是应用程序逻辑的一部分。您可以将它们标准化为单独的表,并且可以通过提供发布日期字段使它们唯一。
ps,对于那些不同意“应用程序安全性”观点的人,企业数据库将具有细粒度的 ACL 模型;所以这不会成为症结所在。
To add to @Mark and @Pascal (autoincrement integers are your best bet) -- SSN's are usefull and should be modelled correctly. Security concerns are part of application logic. You can normalize them into a separate table, and you can make them unique by providing a date-issued field.
p.s., to those who disagree with the `security in application' point, an enterprise DB will have a granular ACL model; so this won't be a sticking point.