使用电子邮件地址作为主键?
与自动递增的数字相比,电子邮件地址是否不适合作为主要地址?
我们的 Web 应用程序需要电子邮件地址在系统中是唯一的。所以,我想到使用电子邮件地址作为主键。然而,我的同事建议字符串比较会比整数比较慢。
不使用电子邮件作为主键是否有正当理由?
我们正在使用 PostgreSQL
。
Is email address a bad candidate for primary when compared to auto incrementing numbers?
Our web application needs the email address to be unique in the system. So, I thought of using email address as primary key. However my colleague suggests that string comparison will be slower than integer comparison.
Is it a valid reason to not use email as primary key?
We are using PostgreSQL
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(25)
字符串比较比int比较慢。但是,如果您只是使用电子邮件地址从数据库中检索用户,则这并不重要。如果您有包含多个连接的复杂查询,这确实很重要。
如果您将有关用户的信息存储在多个表中,则用户表的外键将是电子邮件地址。这意味着您多次存储该电子邮件地址。
String comparison is slower than int comparison. However, this does not matter if you simply retrieve a user from the database using the e-mail address. It does matter if you have complex queries with multiple joins.
If you store information about users in multiple tables, the foreign keys to the users table will be the e-mail address. That means that you store the e-mail address multiple times.
我还要指出,电子邮件对于创建一个独特的领域来说是一个糟糕的选择,有人甚至小企业共享一个电子邮件地址。与电话号码一样,电子邮件可以重复使用。[电子邮件 ;受保护]一年后很容易属于约翰·史密斯,两年后属于朱莉娅·史密斯。
电子邮件的另一个问题是它们经常变化。如果您以此为键加入其他表,那么您还必须更新其他表,当整个客户公司更改其电子邮件时,这可能会对性能造成相当大的影响(我已经看到这种情况发生)。
I will also point out that email is a bad choice to make a unique field, there are people and even small businesses that share an email address. And like phone numbers, emails can get re-used. [email protected] can easily belong to John Smith one year and Julia Smith two years later.
Another problem with emails is that they change frequently. If you are joining to other tables with that as the key, then you will have to update the other tables as well which can be quite a performance hit when an entire client company changes their emails (which I have seen happen.)
主键应该唯一并且恒定
电子邮件地址像季节一样变化。作为查找的辅助键很有用,但作为主键是一个糟糕的选择。
the primary key should be unique and constant
email addresses change like the seasons. Useful as a secondary key for lookup, but a poor choice for the primary key.
使用电子邮件地址作为主键的缺点:
执行连接时速度较慢。
任何其他带有已发布外键的记录现在都具有更大的值,占用更多的磁盘空间。 (考虑到当今的磁盘空间成本,这可能是一个微不足道的问题,除了现在读取记录需要更长的时间。请参见#1。)
电子邮件地址可能会发生变化,这会强制所有记录都使用该地址作为要更新的外键。由于电子邮件地址不会经常更改,因此性能问题可能很小。更大的问题是你必须确保提供它。如果您必须编写代码,那么这会增加工作量,并且可能会出现错误。如果您的数据库引擎支持“更新级联”,那么这只是一个小问题。
使用电子邮件地址作为主键的优点:
您也许能够完全消除某些联接。如果“主记录”中您需要的只是电子邮件地址,那么使用抽象整数键,您将必须执行联接才能检索它。如果密钥是电子邮件地址,那么您已经拥有它,并且不需要加入。这是否对您有帮助取决于这种情况出现的频率。
当您进行即席查询时,人们很容易看到正在引用什么主记录。当您尝试追踪数据问题时,这会很有帮助。
以我的拙见,这两种方式都不是灌篮。当有实用的键可用时,我倾向于使用自然键,因为它们更容易使用,而且在大多数情况下,缺点往往并不重要。
Disadvantages of using an email address as a primary key:
Slower when doing joins.
Any other record with a posted foreign key now has a larger value, taking up more disk space. (Given the cost of disk space today, this is probably a trivial issue, except to the extent that the record now takes longer to read. See #1.)
An email address could change, which forces all records using this as a foreign key to be updated. As email address don't change all that often, the performance problem is probably minor. The bigger problem is that you have to make sure to provide for it. If you have to write the code, this is more work and introduces the possibility of bugs. If your database engine supports "on update cascade", it's a minor issue.
Advantages of using email address as a primary key:
You may be able to completely eliminate some joins. If all you need from the "master record" is the email address, then with an abstract integer key you would have to do a join to retrieve it. If the key is the email address, then you already have it and the join is unnecessary. Whether this helps you any depends on how often this situation comes up.
When you are doing ad hoc queries, it's easy for a human being to see what master record is being referenced. This can be a big help when trying to track down data problems.
You almost certainly will need an index on the email address anyway, so making it the primary key eliminates one index, thus improving the performance of inserts as they now have only one index to update instead of two.
In my humble opinion, it's not a slam-dunk either way. I tend to prefer to use natural keys when a practical one is available because they're just easier to work with, and the disadvantages tend to not really matter much in most cases.
似乎没有人提到电子邮件地址可能被视为私人的可能问题。如果电子邮件地址是主键,则个人资料页面 URL 很可能类似于
..../Users/[电子邮件受保护]
。如果您不想公开用户的电子邮件地址怎么办?您必须找到其他方式来识别用户,可能通过唯一的整数值来创建类似于..../Users/1
的 URL。那么你最终会得到一个唯一的整数值。No one seems to have mentioned a possible problem that email addresses could be considered private. If the email address is the primary key, a profile page URL most likely will look something like
..../Users/[email protected]
. What if you don't want to expose the user's email address? You'd have to find some other way of identifying the user, possibly by a unique integer value to make URLs like..../Users/1
. Then you'd end up with a unique integer value after all.这是相当糟糕的。假设某些电子邮件提供商倒闭了。然后用户会想要更改他们的电子邮件。如果您使用电子邮件作为主键,则用户的所有外键都将复制该电子邮件,从而使其很难更改……
而且我什至还没有开始谈论性能考虑因素。
It is pretty bad. Assume some e-mail provider goes out of business. Users will then want to change their e-mail. If you have used e-mail as primary key, all foreign keys for users will duplicate that e-mail, making it pretty damn hard to change ...
... and I haven't even started talking about performance considerations.
我不知道这是否是您的设置中的问题,但根据您的 RDBMS,列的值可能区分大小写。 PostgreSQL 文档说:“如果将列声明为 UNIQUE 或 PRIMARY KEY,则隐式生成的索引区分大小写”。换句话说,如果您接受用户输入以电子邮件作为主键在表中进行搜索,并且用户提供“[电子邮件受保护]",您将找不到“[电子邮件受保护]”。
I don't know if that might be an issue in your setup, but depending on your RDBMS the values of a columns might be case sensitive. PostgreSQL docs say: „If you declare a column as UNIQUE or PRIMARY KEY, the implicitly generated index is case-sensitive“. In other words, if you accept user input for a search in a table with email as primary key, and the user provides "[email protected]", you won't find “[email protected]".
在逻辑级别,电子邮件是自然键。
在物理级别,假设您使用的是关系数据库,自然键不太适合主键。原因主要是别人提到的性能问题。
因此,可以对设计进行调整。自然键成为备用键(唯一,非空),并且您使用surrogate/artificial/technical key 作为主键,在您的情况下可以是自动增量。
系统puntoout问,
这就是级联的用途。
使用数字代理键作为主键的另一个原因与索引在平台中的工作方式有关。例如,在 MySQL 的 InnoDB 中,表中的所有索引都预先添加了主键,因此您希望 PK 尽可能小(出于速度和大小的考虑)。与此相关的是,当主键按顺序存储时,InnoDB 速度更快,而字符串则无济于事。
使用字符串作为备用键时要考虑的另一件事是,使用所需的实际字符串的哈希可能会更快,从而跳过某些字母的大小写等内容。 (实际上,我是在寻找参考资料来确认我刚才所说的内容时降落在这里的;仍在寻找......)
At the logical level, the email is the natural key.
At the physical level, given you are using a relational database, the natural key doesn't fit well as the primary key. The reason is mainly the performance issues mentioned by others.
For that reason, the design can be adapted. The natural key becomes the alternate key (UNIQUE, NOT NULL), and you use a surrogate/artificial/technical key as the primary key, which can be an auto-increment in your case.
systempuntoout asked,
That's what cascading is for.
Another reason to use a numeric surrogate key as the primary key is related to how the indexing works in your platform. In MySQL's InnoDB, for example, all indexes in a table have the primary key pre-pended to them, so you want the PK to be as small as possible (for speed's and size's sakes). Also related to this, InnoDB is faster when the primary key is stored in sequence, and a string would not help there.
Another thing to take into consideration when using a string as an alternate key, is that using a hash of the actual string that you want might be faster, skipping things like upper and lower cases of some letters. (I actually landed here while looking for a reference to confirm what I just said; still looking...)
是的,最好使用整数。您还可以将电子邮件列设置为唯一约束。
像这样:
yes, it is better if you use an integer instead. you can also set your email column as unique constraint.
like this:
是的,这是一个错误的主键,因为您的用户会想要更新他们的电子邮件地址。
Yes, it is a bad primary key because your users will want to update their email addresses.
整数主键更好的另一个原因是当您引用不同表中的电子邮件地址时。如果地址本身是主键,那么在另一个表中您必须将其用作键。因此,您多次存储电子邮件地址。
Another reason why integer primary key is better is when you refer to email address in different table. If address itself is a primary key then in another table you have to use it as a key. So you store email addresses multiple time.
我对 postgres 不太熟悉。主键是一个很大的话题。我在这个网站(stackoverflow.com)上看到了一些很棒的问题和答案。
我认为通过使用数字主键并在电子邮件列上使用唯一索引可能会获得更好的性能。电子邮件的长度往往会有所不同,并且可能不适合主键索引。
一些阅读此处和此处。
I am not too familiar with postgres. Primary Keys is a big topic. I've seen some excellent questions and answers on this site (stackoverflow.com).
I think you may have better performance by having a numeric primary key and use a UNIQUE INDEX on the email column. Emails tend to vary in length and may not be proper for primary key index.
some reading here and here.
就我个人而言,我在设计数据库时不会使用任何主键信息,因为以后很可能需要更改任何信息。我提供主键的唯一原因是,可以方便地从客户端执行大多数 SQL 操作,而我的选择始终是自增整数类型。
Personally, I do not use any information for primary key when designing database, because it is very likely that I might need to alter any information later. The sole reason that I provide primary key is, it is convenience to do most SQL operation from client-side, and my choice for that has been always auto-increment integer type.
我知道这有点晚了,但我想补充一点,人们放弃了电子邮件帐户,服务提供商恢复了地址,允许其他人使用它。
正如 @HLGEM 指出的那样,“[email protected] 可以轻松属于 John Smith一年后,朱莉娅·史密斯两年后。”在这种情况下,如果约翰·史密斯想要您的服务,您要么必须拒绝使用他的电子邮件地址,要么删除与朱莉娅·史密斯有关的所有记录。
如果您必须删除记录,并且根据当地法律,这些记录与企业的财务历史相关,您可能会发现自己陷入困境。
因此,我永远不会使用电子邮件地址、车牌等数据作为主键,因为无论它们看起来多么独特,它们都超出了您的控制范围,并且可能会提供一些您可能没有时间处理的有趣挑战。
I know this is a bit of a late entry but i would like to add that people abandon email accounts and service providers recover the address allowing another person to use it.
As @HLGEM pointed out "[email protected] can easily belong to John Smith one year and Julia Smith two years later." in this case should John Smith want your service you either have to refuse to use his email address or delete all your records pertaining to Julia Smith.
If you have to delete records and they relate to the financial history of the business depending on local law you could find yourself in hot water.
So i would never use data like email addresses, number plates, etc. as a primary keys because no matter how unique they seem they are out of your control and can provide some interesting challenges that you may not have time to deal with.
您可能需要考虑任何适用的数据监管立法。电子邮件是个人信息,如果您的用户是欧盟公民,那么根据 GDPR,他们可以指示您从记录中删除他们的信息(请记住,无论您位于哪个国家/地区,这都适用)。
如果出于引用完整性或审计等历史原因需要将记录本身保留在数据库中,则使用代理键将允许您将所有个人数据字段设为 NULL。如果他们的个人数据是主键,这显然不是那么容易
You may need to consider any applicable data regulation legislation. Email is personal information, and if your users are EU citizens for instance then under GDPR they can instruct you to delete their information from your records (remember this applies regardless of which country you are based).
If you need to keep the record itself in the database for referential integrity or historical reasons such as audit, using a surrogate key would allow you to just NULL all the personal data field. This obviously isn't as easy if their personal data is the primary key
您的同事是对的:使用自动递增整数作为主键。
您可以在应用程序级别实现电子邮件唯一性,也可以将电子邮件地址列标记为唯一,并在该列上添加索引。
将字段添加为唯一只会在插入该表时花费字符串比较,而不是在执行联接和外键约束检查时花费字符串比较。
当然,您必须注意,在数据库级别向应用程序添加任何约束都可能导致您的应用程序变得不灵活。在将任何字段设置为“唯一”或“非空”之前,请务必充分考虑,因为您的应用程序需要它是唯一的或非空的。
Your colleague is right: Use an autoincrementing integer for your primary key.
You can implement the email-uniqueness either at the application level, or you coudl mark your email address column as unique, and add an index on that column.
Adding the field as unique will cost you string comparision only when inserting into that table, and not when performing joins and foreign key constraint checks.
Of course, you must note that adding any constraints to your application at the database level can cause your app to become inflexible. Always give due consideration before you make any field "unique" or "not null" just because your application needs it to be unique or non-empty.
使用 GUID 作为主键...这样,您可以在执行 INSERT 时从程序生成它,并且不需要从服务器获取响应来找出主键是什么。它在表和数据库中也是唯一的,您不必担心有一天截断表并且自动增量重置为 1 时会发生什么。
Use a GUID as a primary key... that way you can generate it from your program when you do an INSERT and you don't need to get a response from the server to find out what the primary key is. It will also be unique accross tables and databases and you don't have to worry about what happens if you truncate the table some day and the auto-increment gets reset to 1.
您可以通过使用整数主键来提高性能。
you can boost the performance by using integer primary key.
您应该使用整数主键。如果您需要电子邮件列是唯一的,为什么不在该列上简单地设置唯一索引呢?
you should use an integer primary key. if you need the email-column to be unique, why don't you simply set an unique-index on that column?
如果您使用非 int 值作为主键,那么在大数据上插入和检索将非常慢。
If you have a non int value as primary key then insertions and retrievals will be very slow on large data.
主键应选择静态属性。由于电子邮件地址不是静态的,并且可以由多个候选人共享,因此使用它们作为主键并不是一个好主意。此外,电子邮件地址通常是具有一定长度的字符串,可能大于我们想要使用的唯一ID[len(email_address)>len(unique_id)],因此它需要更多空间,甚至最糟糕的是它们会作为外部文件多次存储钥匙。因此,这会导致性能下降。
primary key should be chosen a static attribute. Since email addresses are not static and can be shared by multiple candidates so it is not a good idea to use them as primary key. Moreover email addresses are strings usually of a certain length which may be greater than unique id we would like to use[len(email_address)>len(unique_id)] so it would require more space and even worst they are stored multiple times as foreign key. And consequently it will lead to degrade the performance.
这取决于表。如果表中的行代表电子邮件地址,则电子邮件是最佳 ID。如果不是,那么电子邮件就不是一个好的 ID。
It depends on the table. If the rows in your table represent email addresses, then email is the best ID. If not, then email is not a good ID.
如果只是要求电子邮件唯一,那么您只需使用该列创建唯一索引即可。
If it's simply a matter of requiring the email to be unique then you can just create a unique index with that column.
电子邮件是一个很好的唯一索引候选者,但不适用于主键,例如,如果它是主键,您将无法更改联系人的电子邮件地址。
我认为你的连接查询也会变慢。
Email is a good unique index candidate, but not for primary key, if it is a primary key, you will be no able to change the contact's emails address for example.
I think your join querys will be slower too.
不要使用电子邮件地址作为主键,保持电子邮件唯一但不使用它作为主键,使用用户 ID 或用户名作为主键
don not use email address as primary key , keep email as unique but don not use it as primary key, use user id or username as primary key