在 SQL 数据库中复制/不必要的数据

发布于 2024-08-12 11:43:51 字数 486 浏览 6 评论 0原文

当为多个用户设计新的应用程序/数据库时,通常需要有一个用于用户管理的表。

该表通常总是有用户名和 ID......

只是为了检查我不是一个白痴(好吧,你可能仍然这么认为!)我刚刚下载并查看了 MediaWiki 和 PHPBB 的架构,他们也这样做。

当我第一次了解关系数据库时,我总是被告知一条重要规则是永远不要重复数据或做任何不必要的事情。

那么,为什么我们使用 ID 作为主键而不是 Username 呢?

如果它不是唯一的(例如 SO 系列网站),我理解原因,但在这些应用程序中,它是唯一的。

我唯一能想到的是,执行 Select * from xxx where ID="454" 而不是 Select * from xxx where name="some_really_long_name" 更快或者因为名称很长会大大增加数据库的大小。

这些是唯一的原因,还是我在这里错过了什么?

When designing a new application/database for multiple users, it usually involves having a table for user management.

The table usually always has Username as well as ID....

Just to check that I am not being an idiot (well, you may still think that!) I have just downloaded and taken a look at the schema of both MediaWiki and PHPBB, and they do the same.

When I first learnt about relational databases, I was always told that a big rule was never to duplicate data or do anything Unnecessary.

So, why is it that we have ID as the primary key instead of Username?

I understand the reason if it was not unique (such as the SO series of sites), however in these applications, it is.

The only thing I could think of is it is quicker to do a Select * from xxx where ID="454" instead of Select * from xxx where name="some_really_long_name" or because having really long names would increase the database size a lot.

Are these the only reasons, or have I missed something here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

峩卟喜欢 2024-08-19 11:43:51

使用整数非智能主键代替唯一文本键的原因:

  1. JOIN 和其他查询的速度。

  2. 能够更改文本键值并保持数据库的完整性,而无需更新引用该键的每个表。

    能够更改文本

  3. 当您的应用程序必须在内存中构建键值列表时,提高代码效率并减少内存开销。

  4. 减少引用该键的表的大小。

Reasons for using an integer non-intelligent primary key in place of a unique text key:

  1. Speed of JOINs and other queries.

  2. Ability to change the text key value and preserve the integrity of the database without have to update every table in which the key is referenced.

  3. Increased code efficiency and reduced memory overhead when your application must build a list of key values in memory.

  4. Decreased size of tables which reference the key.

审判长 2024-08-19 11:43:51

键有两个不同的用途,一是防止插入重复行......这并不意味着数据值全部相同,而是意味着两行代表相同的现实世界实体。只有有意义的自然键才能实现这一点。
第二个目的是充当依赖表中外键列的目标。
为此,最窄(最小字节数)的键将为与该键一起使用并在执行搜索时使用的索引生成最佳性能。

因此,当自然键由多列组成或非常宽时,有时建议创建第二个备用键或代理键作为其他表中的 FK 参考文献。这通常是内部创建的值,在数据库内创建,不会暴露在应用程序或系统之外,甚至可能不会暴露在数据库组件本身之外。

如果这是唯一的键,因为它不是有意义的或自然的键,则完全不足以确保数据一致性,因为两行代表同一实体,并且其所有属性仅通过无意义的代理键而不同,仍然可以插入到表中。
因此,在这种情况下,最好将两个钥匙都放在桌子上。

除了经常用于提高性能之外,代理键还有一个额外的优点(因为它们没有意义):无需更改。提出尽可能最好的自然键,准确且唯一地识别实体,并且永远不需要更改,是一种艺术形式,很容易做得不好。 (SSAN 是一个典型的例子)然后,如果现实世界的实体更改了设计不良的自然键中使用的任何值,并且您将其用作唯一的键(因此在其他地方用作 FK),您将必须更改值在数据库中的任何位置,包括在将其用作外键的所有其他表中。

Keys serve two disparate purposes, one is to prevent insertion of duplicate rows.... that doesn't mean that the data values are all the same, it means that the two rows represent the same real-world entity. Only a meaningful natural key can accomplish this.
The second purpose is to act as targets of foreign key columns in dependent tables.
For this, the narrowest, (smallest number of bytes) key will generate the best performance on the index that goes with the key and is used when performing searches.

So when the natural key consists of multiple columns, or is very wide, then sometimes it is advisable to create a second alternate, or surrogate key to be used as the target of FK references in other tables. This is generally an internally created value, created within the database, which is not exposed outside the application or system, perhaps not even outside the database component itself.

If that was then the only key, as it is not a meaningful, or natural key, it is totally insufficient to ensure data consistency, as two rows which represent the same entity, and differ in all their attributes only by the meaningless surrogate key, can still be inserted into the table.
Therefore, in such situations, it is a good practice to have both keys on the table.

In addition to often being used to increase performance, surrogate keys have the additional advantedge (Because they are non-meaningful) of not ever having to be changed. Coming up with the best possible natural key, that accurately and uniquely identifies the entity, and will never need to be changed, is an art form and can easily be done poorly. (SSANs are a canonical example) Then, if the real world entity changes any of the values used in a poorly designed natural key, and you are using it as the only key, (and therefore as FKs elsewhere) you will have to change the value everywhere in the database, including in all the other tables where you are using it as a foreign key.

羞稚 2024-08-19 11:43:51

...为什么我们使用 ID 作为主键而不是用户名?

数据建模的第一条规则是您不要公开您的密钥。这是因为:

  1. 您希望使系统免于向用户群显示无法更改的信息。
  2. 这里存在一个安全问题:假设您不采取预防措施,现在您的用户可能会被欺骗。
  3. 如果您必须迁移到新的数据库、同一供应商或其他供应商 - 过渡应该是无缝的。您可能需要更新引用,但用户会注意到昨天是 1234,今天是 100234。人们已经很难记住创建强密码了……
  4. 基于整数的密钥速度更快,占用的空间也更少。这些被称为人造钥匙。
  5. 允许某人更改用户名是一个很好的功能;当您的系统处理人们因结婚(或离婚)而更改姓名时,这是必要的。

自然密钥与人工密钥

自然密钥只是按原样或作为普遍约定而存在的密钥。例如,美国各州的缩写是理想的密钥,因为:

  1. 它是独一无二的
  2. ,每个了解美国各州的人都会知道该缩写。

人工(又称为代理)密钥与数据没有任何关系 - ID #1234 可以引用任何内容。

理论上,尽可能使用人造钥匙。由于 ID 1234 可能意味着任何内容,因此当您使用自然键时,无需加入即可了解给定 id 值的含义。实际上,由于性能和自然键不太常见 - 使用人工键。

...why is it that we have ID as the primary key instead of Username?

The first rule of data modelling is that you do not expose your keys. This is because:

  1. You want to insulate your system from displaying information to the userbase that can not be altered.
  2. There's a security concern: now your user could be spoofed, assuming you don't take precautions
  3. If you have to migrate to a new database, same vendor or otherwise - the transition should be seamless. You'll likely have to update references, but users will notice if they were 1234 yesterday, and 100234 today. People have enough trouble remembering much less creating a strong password...
  4. Integer based keys are faster, and take less space. These are called artificial keys.
  5. It's a nice feature to allow someone to change their username; when your system deals with peoples names changing because of being married (or divorced) it is a necessity.

Natural vs Artificial Keys

A natural key is only a key that exists, as-is or as a commonly held convention. For example - the abbreviation of a US state makes for an ideal key because:

  1. it's unique
  2. everyone who knows US states will know the abbreviation.

An artificial (AKA surrogate) key does not have any relation to the data - ID #1234 could refer to anything.

In theory, use artificial keys as much as possible. Because ID 1234 could mean anything, when you use natural keys there's less need to have to join to know what a given id value means. In reality, because of both performance and that natural keys aren't very common - use artificial keys.

各自安好 2024-08-19 11:43:51
  1. 用户名可以轻松更改/重命名。这非常有用。
  2. 它稍微快
  3. 一点总体上它节省了空间/内存(确实,很小的量)

PS:不要忘记在username列上添加unique约束。

  1. A username can be easly changed/renamed. It's very useful.
  2. It's marginally faster
  3. Overall it saves space/memory(true, tiny amounts)

PS: don't forget to add a unique constraint on the username column.

隐诗 2024-08-19 11:43:51

这就是所谓的代理键。它不是复制数据,而是替代数据。大多数情况下,它的存在只是因为它更容易,特别是当自然键是多个字段时,然后您必须将其用作另一个表中的外键。

它们很常见,但存在健康的争论它们是否“正确”。就我个人而言,我使用它们只是因为它简化了生活。另一个好处是,如果您使用自然键并且有人输错了该键,那么您最终必须在多个表之间进行级联更新(如果它是外键)。由于代理键不可能拼写错误(它是系统生成的),因此这从来都不是问题。

This is what is known as a Surrogate Key. It is not duplicating data, it is standing in for data. Mostly it exists simply because it's easier, especially when the natural key is multiple fields, and then you have to use that as a foreign key in another table.

They are common, but there is healthy debate about if they are "correct" or not. Personally, I use them just because it simplifies life. Another bonus is that if you use a natural key and someone typos the key, you end up having to do a cascading update across multiple tables if it's a foreign key. Since it is impossible to typo a surrogate key (it being system generated), this is never an issue.

淡莣 2024-08-19 11:43:51

至少有一个原因 - 如果用户名不是主键,则可以便宜地更改用户名。

At least one reason - you can change a Username inexpensively if it's not the primary key.

长亭外,古道边 2024-08-19 11:43:51

一个关键的优点是,通过拥有单独的 ID,您可以更改用户名,而不会破坏任何外键关系。

此外,使用没有意义的主键是良好的 RDBMS 实践 - 用户名显然有意义

One key advantage is by having a separate ID you are able to change the username without breaking any of the foreign key relations.

Additionally, it is good RDBMS practice to have primary keys that have no meaning - a username obviously has meaning

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文