使用 URL(长字符串)作为主键更好,还是使用更短的序列整数主键?
假设我将网页存储在 PostgreSQL 中。使用网页的URL作为主键更方便,还是创建更简洁的SERIAL整数主键?对于这种情况,推荐的方法是什么?
Say I'm storing webpages in PostgreSQL. Is it more convenient to use the URL of the webpage as the primary key, or to create a more succinct SERIAL integer primary key? What is the recommended approach for a case like this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您存储网页的目的是什么?
缓存网页似乎取决于缓存它们的原因。首先想到的是 URL 是可以改变的。您是否希望您的记录也更改其主键?或者新的 URL 会成为新的记录吗?
For what purpose are you storing the webpages?
It seems like caching webpages would be dependent on the reason you are caching them. The first thing that comes to mind is that URL's can change. Do you want your record to change it's primary key as well? Or would a new URL be a new record?
除了少数例外,拥有它几乎总是更好,这样 PK 除了作为对数据库中该行的引用(即代理键)之外没有任何意义。然后,如果您愿意,可以对 URI 字段设置唯一约束。如果不出意外的话,它可以防止任何东西引用该表,而不必保留该字符串的副本,并且如果您稍后需要混淆该字符串,请将其拉入另一个表以用于分析或重组目的,或类似的任何其他内容代理键比自然键有更大的好处。
我通常还喜欢避免键在一行中包含具有含义的不同部分的情况,而纯文本 URI 具有丰富的含义,因为它通常可以分解为多个组件。
With a few exceptions, it is almost always better to have it so that the PK doesn't have meaning outside of being a reference to that row in the database, i.e., a surrogate key. Then you can put a unique constraint on the URI field if you'd like. If nothing else, it keeps anything from referencing the table from having to also hold a copy of the string, and if you later need to obfuscate the string, pull it into another table for either analytics or restructuring purposes, or anything else along those lines the surrogate key will be of greater benefit than the natural key.
I also generally prefer to avoid situations where the key contains different parts in a single row that have meaning, which a plain text URI has in abundance since it can generally be broken into components.
您可能需要考虑的另一件事是在 JOINS、INDEXES 和条件中使用字符串字段时会对性能造成影响。我必须同意 @dclelements 和 @Neil 推荐整数 PK 字段的观点。
其他考虑因素是无法自动递增 URL PK 值,因此您必须处理(较高概率)表中的重复插入。
PK 的整数是更好的数据库设计。
Another thing you may want to consider is the performance hit you will take when using string field in JOINS, INDEXES and conditions. I must agree with @dclelements and @Neil in recommending an integer PK field.
Other considerations are the inability to auto increment URL PK values, so you will have to handle (the higher probability) of duplicate inserts to the table.
Integer for a PK is better database design.