There´s no problem in making your primary key from various fields, that's a Natural Key.
You can use a Identity column (associated with a unique index on the candidate fields) to make a Surrogate Key.
That´s an old discussion. I prefer surrogate keys in most situations.
But there´s no excuse for the lack of a key.
RE: EDIT
Yeah, there´s a lot of controversy about that :D
I don´t see any obvious advantage on natural keys, besides the fact that they are the natural choice. You will always think in Name, SocialNumber - or something like that - instead of idPerson.
Surrogate keys are the answer to some of the problems that natural keys have (propagating changes for example).
As you get used to surrogates, it seems more clean, and manageable.
But in the end, you´ll find out that it's just a matter of taste - or mindset -. People "think better" with natural keys, and others don´t.
关于链接表的一些评论,这是正确的,这是一个例外,但字段应该是 FK 以保持完整性,并且在某些情况下,如果链接中的重复未经授权,这些字段也可以是主键。 ..但是为了保持简单的形式,因为异常是编程中经常发生的事情,应该存在主键以保持数据的完整性。
Tables should have a primary key all the time. When it doesn't it should have been an AutoIncrement fields.
Sometime people omit primary key because they transfer a lot of data and it might slow down (depend of the database) the process. BUT, it should be added after it.
Some one comment about link table, this is right, it's an exception BUT fields should be FK to keep the integrity, and is some case those fields can be primary keys too if duplicate in links is not authorized... but to keep in a simple form because exception is something often in programming, primary key should be present to keep the integrity of your data.
主键在应用程序的上下文中不应具有任何意义。 这意味着它不应该是 SKU、帐号、员工 ID 或对您的应用程序有意义的任何其他信息。 它只是与实体关联的唯一键。
主键由数据库用于优化目的,应用程序不应将其用于识别特定实体或与特定实体相关的其他用途。
始终拥有单值主键使得执行 UPSERT 变得非常简单。
与多列索引相比,更喜欢单列上的多个索引。 例如,如果您有一个两列键,那么与创建两列索引相比,更倾向于在每列上创建索引。 如果我们在名字+姓氏上创建多列键,那么如果不提供名字,我们就无法对姓氏进行索引查找。 在两列上都有索引允许优化器对其中一列或两列执行索引查找,无论它们在 WHERE 子句中如何表达。
如果您的表很大,请尝试根据最突出的搜索条件将表分区。
如果您的表中包含大量 Id 字段,请考虑删除除具有 id (PK)、org_id(原始表的 FK)和 id_type 列的单个表的主键以外的所有字段。 为新表上的所有列创建索引并将其与原始表关联。 通过这种方式,您现在可以仅使用单个索引对任意数量的 id 执行索引查找。
Here are my own rule of thumbs I have settled on after 25+ years of development experience.
All tables should have a single column primary key that auto increments.
Include it in any view that is meant to be updateable
The primary key should not have any meaning in the context of your application. This means that it should not be a SKU, or an account number or an employee id or any other information that is meaningful to your application. It is merely a unique key associated with an entity.
The primary key is used by the database for optimization purposes and should not be used by your application for anything more than identifying a particular entity or relating to a particular entity.
Always having a single value primary key makes performing UPSERTs very straightforward.
Favor multiple indices on single columns over multi-column indices. For example, if you have a two column key, favor creating an index on each column over creating a two column index. If we create a multi-column key on firstname + lastname, we can't do indexed lookups on lastname without providing a firstname as well. Having indices on both columns allows the optimizer to perform indexed lookups on either or both columns regardless of how they are expressed in your WHERE clause.
If your tables are massive, explore partitioning the table into segments based on the most prominent search criteria.
If you have a table that has a significant number of Id fields in it, consider removing all except the primary key to a single table which has an id (PK), an org_id (FK to original table) and an id_type column. Create indices for all columns on the new table and relate it to the original table. In this manner, you can now perform indexed lookups of any number of ids using only a single index.
话虽如此,我还是很感激当今企业中存在的许多设计不佳的数据库(无意义的代理键控数据损坏的 1NF 庞然大物),因为这意味着了解正确数据库设计的人们有无穷无尽的工作要做。 但悲伤的一面是,它有时确实让我觉得自己像西西弗斯,但我敢打赌他有一张 401k(在坠机之前)。 对于重要的数据库设计问题,请远离博客和网站。 如果您正在设计数据库,请查找 CJ Date。 您还可以参考 Celko for SQL Server,但前提是您首先要保持警惕。 在 Oracle 方面,请参考 Tom Kyte。
What is special about the primary key?
What is the purpose of a table in a schema? What is the purpose of a key of a table? What is special about the primary key? The discussions around primary keys seem to miss the point that the primary key is part of a table, and that table is part of a schema. What is best for the table and table relationships should drive the key that is used.
Tables (and table relationships) contain facts about information you wish to record. These facts should be self-contained, meaningful, easily understood, and non-contradictory. From a design perspective, other tables added or removed from a schema should not impact on the table in question. There must be a purpose for storing the data related only to the information itself. Understanding what is stored in a table should not require undergoing a scientific research project. No fact stored for the same purpose should be stored more than once. Keys are a whole or part of the information being recorded which is unique, and the primary key is the specially designated key that is to be the primary access point to the table (i.e. it should be chosen for data consistency and usage, not just insert performance).
ASIDE: The unfortunately side effect of most databases being designed and developed by application programmers (which I am sometimes) is that what is best for the application or application framework often drives the primary key choice for tables. This leads to integer and GUID keys (as these are simple to use for application frameworks) and monolithic table designs (as these reduce the number of application framework objects needed to represent the data in memory). These application driven database design decisions lead to significant data consistency problems when used at scale. Application frameworks designed in this manner naturally lead to table at a time designs. “Partial records” are created in tables and data filled in over time. Multi-table interaction is avoided or when used causes inconsistent data when the application functions improperly. These designs lead to data that is meaningless (or difficult to understand), data spread over tables (you have to look at other tables to make sense of the current table), and duplicated data.
It was said that primary keys should be as small as necessary. I would says that keys should be only as large as necessary. Randomly adding meaningless fields to a table should be avoided. It is even worse to make a key out of a randomly added meaningless field, especially when it destroys the join dependency from another table to the non-primary key. This is only reasonable if there are no good candidate keys in the table, but this occurrence is surely a sign of a poor schema design if used for all tables.
It was also said that primary keys should never change as updating a primary key should always be out of the question. But update is the same as delete followed by insert. By this logic, you should never delete a record from a table with one key and then add another record with a second key. Adding the surrogate primary key does not remove the fact that the other key in the table exists. Updating a non-primary key of a table can destroy the meaning of the data if other tables have a dependency on that meaning through a surrogate key (e.g. a status table with a surrogate key having the status description changed from ‘Processed’ to ‘Cancelled’ would definitely corrupt the data). What should always be out of the question is destroying data meaning.
Having said this, I am grateful for the many poorly designed databases that exist in businesses today (meaningless-surrogate-keyed-data-corrupted-1NF behemoths), because that means there is an endless amount of work for people that understand proper database design. But on the sad side, it does sometimes make me feel like Sisyphus, but I bet he had one heck of a 401k (before the crash). Stay away from blogs and websites for important database design questions. If you are designing databases, look up CJ Date. You can also reference Celko for SQL Server, but only if you hold your nose first. On the Oracle side, reference Tom Kyte.
Natural versus artificial keys to me is a matter of how much of the business logic you want in your database. Social Security number (SSN) is a great example.
"Each client in my database will, and must, have an SSN." Bam, done, make it the primary key and be done with it. Just remember when your business rule changes you're burned.
I don't like natural keys myself, due to my experience with changing business rules. But if your sure it won't change, it might prevent a few critical joins.
我也总是使用数字 ID 列。 在oracle中,我使用number(18,0)没有真正的原因高于number(12,0)(或者任何int而不是long),也许我只是不想担心获得几十亿行数据库!
我还包括一个创建和修改的列(类型时间戳)用于基本跟踪,这似乎很有用。
我不介意对其他列组合设置唯一约束,但我真的很喜欢我的 id、创建、修改的基线要求。
I too always use a numeric ID column. In oracle I use number(18,0) for no real reason above number(12,0) (or whatever is an int rather than a long), maybe I just don't want to ever worry about getting a few billion rows in the db!
I also include a created and modified column (type timestamp) for basic tracking, where it seems useful.
I don't mind setting up unique constraints on other combinations of columns, but I really like my id, created, modified baseline requirements.
I look for natural primary keys and use them where I can.
If no natural keys can be found, I prefer a GUID to a INT++ because SQL Server use trees, and it is bad to always add keys to the end in trees.
On tables that are many-to-many couplings I use a compound primary key of the foreign keys.
Because I'm lucky enough to use SQL Server I can study execution plans and statistics with the profiler and the query analyzer and find out how my keys are performing very easily.
I worked for a client who had used SSN as a primary key and then because of HIPAA regulations was forced to change to a "MemberID" and it caused a ton of problems when updating the foreign keys in related tables. Sticking to a consistent standard of an identity column has helped me avoid a similar problem in all of my projects.
对于行数较少的查找表,您可以使用 3 CHAR 代码作为主键,因为这比 INT 占用的空间更少,但性能差异可以忽略不计。 除此之外,我将始终使用 INT,除非您有一个引用表,该表可能具有由关联表中的外键组成的复合主键。
All tables should have a primary key. Otherwise, what you have is a HEAP - this, in some situations, might be what you want (heavy insert load when the data is then replicated via a service broker to another database or table for instance).
For lookup tables with a low volume of rows, you can use a 3 CHAR code as the primary key as this takes less room than an INT, but the performance difference is negligible. Other than that, I would always use an INT unless you have a reference table that perhaps has a composite primary key made up from foreign keys from associated tables.
If you really want to read through all of the back and forth on this age-old debate, do a search for "natural key" on Stack Overflow. You should get back pages of results.
我们做了很多连接,复合主键刚刚成为性能消耗者。 即使您引入了第二个候选键,一个简单的 int 或 long 也可以解决许多问题,但与三个字段相比,在一个字段上加入要容易得多且更容易理解。
We do a lot of joins and composite primary keys have just become a performance hog. A simple int or long takes care of many problems even though you are introducing a second candidate key, but it's a lot easier and more understandable to join on one field versus three.
I'll be up-front about my preference for natural keys - use them where possible, as they'll make your life of database administration a lot easier. I established a standard in our company that all tables have the following columns:
Row ID (GUID)
Creator (string; has a default of the current user's name (SUSER_SNAME() in T-SQL))
Created (DateTime)
Timestamp
Row ID has a unique key on it per table, and in any case is auto-generated per row (and permissions prevent anyone editing it), and is reasonably guaranteed to be unique across all tables and databases. If any ORM systems need a single ID key, this is the one to use.
Meanwhile, the actual PK is, if possible, a natural key. My internal rules are something like:
People - use surrogate key, e.g. INT. If it's internal, the Active Directory user GUID is an acceptable choice
Lookup tables (e.g. StatusCodes) - use a short CHAR code; it's easier to remember than INTs, and in many cases the paper forms and users will also use it for brevity (e.g. Status = "E" for "Expired", "A" for "Approved", "NADIS" for "No Asbestos Detected In Sample")
Linking tables - combination of FKs (e.g. EventId, AttendeeId)
So ideally you end up with a natural, human-readable and memorable PK, and an ORM-friendly one-ID-per-table GUID.
Caveat: the databases I maintain tend to the 100,000s of records rather than millions or billions, so if you have experience of larger systems which contraindicates my advice, feel free to ignore me!
Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.
Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.
Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "natural keys" can change in real world situations. Make sure to add UNIQUE constraints for these where necessary to enforce consistency.
On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.
Natural verses artifical keys is a kind of religious debate among the database community - see this article and others it links to. I'm neither in favour of always having artifical keys, nor of never having them. I would decide on a case-by-case basis, for example:
US States: I'd go for state_code ('TX' for Texas etc.), rather than state_id=1 for Texas
Employees: I'd usually create an artifical employee_id, because it's hard to find anything else that works. SSN or equivalent may work, but there could be issues like a new joiner who hasn't supplied his/her SSN yet.
Employee Salary History: (employee_id, start_date). I would not create an artifical employee_salary_history_id. What point would it serve (other than "foolish consistency")
Wherever artificial keys are used, you should always also declare unique constraints on the natural keys. For example, use state_id if you must, but then you'd better declare a unique constraint on state_code, otherwise you are sure to eventually end up with:
I avoid using natural keys for one simple reason -- human error. Although natural unique identifiers are often available (SSN, VIN, Account Number, etc.), they require a human to enter them correctly. If you're using SSNs as a primary key, someone transposes a couple of numbers during data entry, and the error isn't discovered immediately, then you're faced with changing your primary key.
My primary keys are all handled by the database program in the background and the user is never aware of them.
Just an extra comment on something that is often overlooked. Sometimes not using a single surrogate key as primary has benefits in the child tables. Let's say we have a design that allows you to run multiple companies within the one database (maybe it's a hosted solution, or whatever).
Let's say we have these tables and columns:
Company:
CompanyId (primary key)
CostCenter:
CompanyId (primary key, foreign key to Company)
CostCentre (primary key)
CostElement
CompanyId (primary key, foreign key to Company)
CostElement (primary key)
Invoice:
InvoiceId (primary key)
CompanyId (primary key, in foreign key to CostCentre, in foreign key to CostElement)
CostCentre (in foreign key to CostCentre)
CostElement (in foreign key to CostElement)
In case that last bit doesn't make sense, Invoice.CompanyId is part of two foreign keys, one to the CostCentre table and one to the CostElement table. The primary key is (InvoiceId, CompanyId).
In this model, it's not possible to screw-up and reference a CostElement from one company and a CostCentre from another company. If a single surrogate key was used as primary on the CostElement and CostCentre tables, and without the foreign key relations in the Invoice table, it would be.
发布评论
评论(21)
除了所有这些好的答案之外,我只想分享我刚刚读过的一篇好文章,伟大的主键辩论。
仅引用几点:
开发人员在为每个表选择主键时必须应用一些规则:
自然键(往往)会违反规则。 代理键符合规则。 (你最好阅读那篇文章,它值得你花时间!)
Besides all those good answers, I just want to share a good article I just read, The great primary-key debate.
Just to quote a few points:
The developer must apply a few rules when choosing a primary key for each table:
Natural keys (tend to) break the rules. Surrogate keys comply with the rules. (You better read through that article, it is worth your time!)
从各个字段创建主键没有问题,这是一个自然键。
您可以使用标识列(与候选字段上的唯一索引关联)来创建代理键。
这是一个古老的讨论。 在大多数情况下我更喜欢代理键。
但没有钥匙是没有借口的。
回复:编辑
是的,对此有很多争议:D
除了自然键是自然选择这一事实之外,我没有看到自然键有任何明显的优势。 您总是会想到姓名、社交号码 - 或类似的东西 - 而不是idPerson。
代理键是自然键所存在的一些问题的答案(例如传播更改)。
当你习惯了代理后,它看起来更干净、更易于管理。
但最终,你会发现这只是品味或心态的问题。 人们使用自然键“思考得更好”,而其他人则不然。
There´s no problem in making your primary key from various fields, that's a Natural Key.
You can use a Identity column (associated with a unique index on the candidate fields) to make a Surrogate Key.
That´s an old discussion. I prefer surrogate keys in most situations.
But there´s no excuse for the lack of a key.
RE: EDIT
Yeah, there´s a lot of controversy about that :D
I don´t see any obvious advantage on natural keys, besides the fact that they are the natural choice. You will always think in Name, SocialNumber - or something like that - instead of idPerson.
Surrogate keys are the answer to some of the problems that natural keys have (propagating changes for example).
As you get used to surrogates, it seems more clean, and manageable.
But in the end, you´ll find out that it's just a matter of taste - or mindset -. People "think better" with natural keys, and others don´t.
表应该始终有一个主键。 如果没有,它应该是一个自动增量字段。
有时人们会忽略主键,因为他们传输大量数据,这可能会减慢(取决于数据库)过程。 但是,它应该添加在它之后。
关于链接表的一些评论,这是正确的,这是一个例外,但字段应该是 FK 以保持完整性,并且在某些情况下,如果链接中的重复未经授权,这些字段也可以是主键。 ..但是为了保持简单的形式,因为异常是编程中经常发生的事情,应该存在主键以保持数据的完整性。
Tables should have a primary key all the time. When it doesn't it should have been an AutoIncrement fields.
Sometime people omit primary key because they transfer a lot of data and it might slow down (depend of the database) the process. BUT, it should be added after it.
Some one comment about link table, this is right, it's an exception BUT fields should be FK to keep the integrity, and is some case those fields can be primary keys too if duplicate in links is not authorized... but to keep in a simple form because exception is something often in programming, primary key should be present to keep the integrity of your data.
以下是我在 25 年以上的开发经验后得出的自己的经验法则。
增量。
主键由数据库用于优化目的,应用程序不应将其用于识别特定实体或与特定实体相关的其他用途。
始终拥有单值主键使得执行 UPSERT 变得非常简单。
与多列索引相比,更喜欢单列上的多个索引。
例如,如果您有一个两列键,那么与创建两列索引相比,更倾向于在每列上创建索引。 如果我们在名字+姓氏上创建多列键,那么如果不提供名字,我们就无法对姓氏进行索引查找。 在两列上都有索引允许优化器对其中一列或两列执行索引查找,无论它们在 WHERE 子句中如何表达。
如果您的表很大,请尝试根据最突出的搜索条件将表分区。
如果您的表中包含大量 Id 字段,请考虑删除除具有 id (PK)、org_id(原始表的 FK)和 id_type 列的单个表的主键以外的所有字段。 为新表上的所有列创建索引并将其与原始表关联。 通过这种方式,您现在可以仅使用单个索引对任意数量的 id 执行索引查找。
Here are my own rule of thumbs I have settled on after 25+ years of development experience.
increments.
The primary key is used by the database for optimization purposes and should not be used by your application for anything more than identifying a particular entity or relating to a particular entity.
Always having a single value primary key makes performing UPSERTs very straightforward.
Favor multiple indices on single columns over multi-column indices.
For example, if you have a two column key, favor creating an index on each column over creating a two column index. If we create a multi-column key on firstname + lastname, we can't do indexed lookups on lastname without providing a firstname as well. Having indices on both columns allows the optimizer to perform indexed lookups on either or both columns regardless of how they are expressed in your WHERE clause.
If your tables are massive, explore partitioning the table into segments based on the most prominent search criteria.
If you have a table that has a significant number of Id fields in it, consider removing all except the primary key to a single table which has an id (PK), an org_id (FK to original table) and an id_type column. Create indices for all columns on the new table and relate it to the original table. In this manner, you can now perform indexed lookups of any number of ids using only a single index.
自然密钥(如果可用)通常是最好的。 因此,如果 datetime/char 唯一地标识该行并且两个部分都对该行有意义,那就太好了。
如果只是日期时间有意义,并且只是附加字符以使其唯一,那么您不妨使用识别字段。
A natural key, if available, is usually best. So, if datetime/char uniquely identifies the row and both parts are meaningful to the row, that's great.
If just the datetime is meaningful, and the char is just tacked on to make it unique, then you might as well just go with an identify field.
主键有什么特别之处?
模式中表的用途是什么? 表的键的用途是什么? 主键有什么特别之处? 关于主键的讨论似乎忽略了一点:主键是表的一部分,而该表是模式的一部分。 最适合表和表关系的应该驱动所使用的键。
表(和表关系)包含有关您希望记录的信息的事实。 这些事实应该是独立的、有意义的、易于理解的、不矛盾的。 从设计角度来看,从架构中添加或删除的其他表不应影响相关表。 存储仅与信息本身相关的数据必须有一个目的。 了解表中存储的内容不需要进行科学研究项目。 出于同一目的而存储的事实不应存储多次。 键是记录的信息的全部或部分,是唯一的,主键是专门指定的键,它是表的主要访问点(即应该选择它来保证数据的一致性和使用,而不仅仅是插入表现)。
由应用程序程序员(有时我也是)开发的是
通常什么最适合应用程序或应用程序框架
驱动表的主键选择。 这导致整数和
GUID 密钥(因为这些对于应用程序框架来说很容易使用)和
整体表设计(因为这减少了应用程序的数量
表示内存中数据所需的框架对象)。 这些
应用程序驱动的数据库设计决策会产生重要的数据
大规模使用时的一致性问题。 应用框架
这样的设计方式自然会导致一次表的设计。
“部分记录”是在表格中创建的,并且随着时间的推移填充数据。
避免多表交互或者使用时导致不一致
当应用程序运行不正常时的数据。 这些设计引领
对于无意义(或难以理解)的数据,数据传播
在表格上(你必须查看其他表格才能理解
当前表)和重复数据。
有人说主键应该尽可能小。 我想说的是,键应该只有必要的大小。 应避免向表中随机添加无意义的字段。 更糟糕的是,从随机添加的无意义字段中创建键,尤其是当它破坏了从另一个表到非主键的连接依赖关系时。 仅当表中没有好的候选键时,这才是合理的,但如果用于所有表,这种情况肯定是架构设计不佳的标志。
还有人说主键永远不应该改变,因为更新主键应该永远是不可能的。 但更新与删除后插入相同。 按照这种逻辑,您永远不应该使用一个键从表中删除一条记录,然后使用第二个键添加另一条记录。 添加代理主键并不会消除表中存在其他键的事实。 如果其他表通过代理键依赖于该含义,则更新表的非主键可能会破坏数据的含义(例如,带有代理键的状态表的状态描述从“已处理”更改为“已取消”) '肯定会损坏数据)。 永远不要出现破坏数据意义的问题。
话虽如此,我还是很感激当今企业中存在的许多设计不佳的数据库(无意义的代理键控数据损坏的 1NF 庞然大物),因为这意味着了解正确数据库设计的人们有无穷无尽的工作要做。 但悲伤的一面是,它有时确实让我觉得自己像西西弗斯,但我敢打赌他有一张 401k(在坠机之前)。 对于重要的数据库设计问题,请远离博客和网站。 如果您正在设计数据库,请查找 CJ Date。 您还可以参考 Celko for SQL Server,但前提是您首先要保持警惕。 在 Oracle 方面,请参考 Tom Kyte。
What is special about the primary key?
What is the purpose of a table in a schema? What is the purpose of a key of a table? What is special about the primary key? The discussions around primary keys seem to miss the point that the primary key is part of a table, and that table is part of a schema. What is best for the table and table relationships should drive the key that is used.
Tables (and table relationships) contain facts about information you wish to record. These facts should be self-contained, meaningful, easily understood, and non-contradictory. From a design perspective, other tables added or removed from a schema should not impact on the table in question. There must be a purpose for storing the data related only to the information itself. Understanding what is stored in a table should not require undergoing a scientific research project. No fact stored for the same purpose should be stored more than once. Keys are a whole or part of the information being recorded which is unique, and the primary key is the specially designated key that is to be the primary access point to the table (i.e. it should be chosen for data consistency and usage, not just insert performance).
and developed by application programmers (which I am sometimes) is
that what is best for the application or application framework often
drives the primary key choice for tables. This leads to integer and
GUID keys (as these are simple to use for application frameworks) and
monolithic table designs (as these reduce the number of application
framework objects needed to represent the data in memory). These
application driven database design decisions lead to significant data
consistency problems when used at scale. Application frameworks
designed in this manner naturally lead to table at a time designs.
“Partial records” are created in tables and data filled in over time.
Multi-table interaction is avoided or when used causes inconsistent
data when the application functions improperly. These designs lead
to data that is meaningless (or difficult to understand), data spread
over tables (you have to look at other tables to make sense of the
current table), and duplicated data.
It was said that primary keys should be as small as necessary. I would says that keys should be only as large as necessary. Randomly adding meaningless fields to a table should be avoided. It is even worse to make a key out of a randomly added meaningless field, especially when it destroys the join dependency from another table to the non-primary key. This is only reasonable if there are no good candidate keys in the table, but this occurrence is surely a sign of a poor schema design if used for all tables.
It was also said that primary keys should never change as updating a primary key should always be out of the question. But update is the same as delete followed by insert. By this logic, you should never delete a record from a table with one key and then add another record with a second key. Adding the surrogate primary key does not remove the fact that the other key in the table exists. Updating a non-primary key of a table can destroy the meaning of the data if other tables have a dependency on that meaning through a surrogate key (e.g. a status table with a surrogate key having the status description changed from ‘Processed’ to ‘Cancelled’ would definitely corrupt the data). What should always be out of the question is destroying data meaning.
Having said this, I am grateful for the many poorly designed databases that exist in businesses today (meaningless-surrogate-keyed-data-corrupted-1NF behemoths), because that means there is an endless amount of work for people that understand proper database design. But on the sad side, it does sometimes make me feel like Sisyphus, but I bet he had one heck of a 401k (before the crash). Stay away from blogs and websites for important database design questions. If you are designing databases, look up CJ Date. You can also reference Celko for SQL Server, but only if you hold your nose first. On the Oracle side, reference Tom Kyte.
我怀疑原始数据结构的设计者需要史蒂文·A·洛(Steven A. Lowe)的卷起报纸疗法。
顺便说一句,GUID 作为主键可能会消耗性能。 我不会推荐它。
I suspect Steven A. Lowe's rolled up newspaper therapy is required for the designer of the original data structure.
As an aside, GUIDs as a primary key can be a performance hog. I wouldn't recommend it.
对我来说,自然键与人工键的区别在于您希望数据库中有多少业务逻辑。 社会安全号码 (SSN) 就是一个很好的例子。
“我数据库中的每个客户都将且必须拥有 SSN。” 砰,完成,将其设为主键并完成。 请记住,当您的业务规则发生变化时,您就会被烧毁。
由于我在不断变化的业务规则方面的经验,我自己不喜欢自然键。 但如果您确定它不会改变,它可能会阻止一些关键的连接。
Natural versus artificial keys to me is a matter of how much of the business logic you want in your database. Social Security number (SSN) is a great example.
"Each client in my database will, and must, have an SSN." Bam, done, make it the primary key and be done with it. Just remember when your business rule changes you're burned.
I don't like natural keys myself, due to my experience with changing business rules. But if your sure it won't change, it might prevent a few critical joins.
我也总是使用数字 ID 列。 在oracle中,我使用number(18,0)没有真正的原因高于number(12,0)(或者任何int而不是long),也许我只是不想担心获得几十亿行数据库!
我还包括一个创建和修改的列(类型时间戳)用于基本跟踪,这似乎很有用。
我不介意对其他列组合设置唯一约束,但我真的很喜欢我的 id、创建、修改的基线要求。
I too always use a numeric ID column. In oracle I use number(18,0) for no real reason above number(12,0) (or whatever is an int rather than a long), maybe I just don't want to ever worry about getting a few billion rows in the db!
I also include a created and modified column (type timestamp) for basic tracking, where it seems useful.
I don't mind setting up unique constraints on other combinations of columns, but I really like my id, created, modified baseline requirements.
我寻找自然主键并尽可能使用它们。
如果找不到自然键,我更喜欢 GUID 而不是 INT++,因为 SQL Server 使用树,并且总是将键添加到树的末尾是不好的。
在多对多耦合的表上,我使用外键的复合主键。
因为我很幸运能够使用 SQL Server,所以我可以使用探查器和查询分析器研究执行计划和统计信息,并轻松了解我的密钥的执行情况。
I look for natural primary keys and use them where I can.
If no natural keys can be found, I prefer a GUID to a INT++ because SQL Server use trees, and it is bad to always add keys to the end in trees.
On tables that are many-to-many couplings I use a compound primary key of the foreign keys.
Because I'm lucky enough to use SQL Server I can study execution plans and statistics with the profiler and the query analyzer and find out how my keys are performing very easily.
您应该使用由多个字段组成的“复合”或“复合”主键。
这是一个完全可以接受的解决方案,请访问此处了解更多信息:)
You should use a 'composite' or 'compound' primary key that comprises of multiple fields.
This is a perfectly acceptable solution, go here for more info :)
我总是使用自动编号或身份字段。
我为一位客户工作过,他使用 SSN 作为主键,然后由于 HIPAA 法规,被迫更改为“MemberID”,这在更新相关表中的外键时导致了很多问题。 坚持一致的身份列标准帮助我避免了所有项目中的类似问题。
I always use an autonumber or identity field.
I worked for a client who had used SSN as a primary key and then because of HIPAA regulations was forced to change to a "MemberID" and it caused a ton of problems when updating the foreign keys in related tables. Sticking to a consistent standard of an identity column has helped me avoid a similar problem in all of my projects.
GUID 可以用作主键,但您需要创建正确类型的 GUID,以便它表现良好。
您需要生成 COMB GUID。 关于它和性能统计的一篇好文章是
GUID 作为主键的成本 。
另外,SQL 中构建 COMB GUID 的一些代码位于 唯一标识符与身份(存档)。
GUIDs can be used as a primary key, but you need to create the right type of GUID so that it performs well.
You need to generate COMB GUIDs. A good article about it and performance statistics is
The Cost of GUIDs as Primary Keys.
Also some code on building COMB GUIDs in SQL is in Uniqueidentifier vs identity(archive).
所有表都应该有一个主键。 否则,您拥有的是堆 - 在某些情况下,这可能就是您想要的(例如,当数据通过服务代理复制到另一个数据库或表时,会产生大量插入负载)。
对于行数较少的查找表,您可以使用 3 CHAR 代码作为主键,因为这比 INT 占用的空间更少,但性能差异可以忽略不计。 除此之外,我将始终使用 INT,除非您有一个引用表,该表可能具有由关联表中的外键组成的复合主键。
All tables should have a primary key. Otherwise, what you have is a HEAP - this, in some situations, might be what you want (heavy insert load when the data is then replicated via a service broker to another database or table for instance).
For lookup tables with a low volume of rows, you can use a 3 CHAR code as the primary key as this takes less room than an INT, but the performance difference is negligible. Other than that, I would always use an INT unless you have a reference table that perhaps has a composite primary key made up from foreign keys from associated tables.
如果您确实想通读有关这一古老争论的所有来回内容,请在 Stack Overflow 上搜索“自然键”。 您应该得到结果页面。
If you really want to read through all of the back and forth on this age-old debate, do a search for "natural key" on Stack Overflow. You should get back pages of results.
我们做了很多连接,复合主键刚刚成为性能消耗者。 即使您引入了第二个候选键,一个简单的 int 或 long 也可以解决许多问题,但与三个字段相比,在一个字段上加入要容易得多且更容易理解。
We do a lot of joins and composite primary keys have just become a performance hog. A simple int or long takes care of many problems even though you are introducing a second candidate key, but it's a lot easier and more understandable to join on one field versus three.
我将坦率地表达我对自然键的偏好 - 尽可能使用它们,因为它们将使您的数据库管理生活变得更加轻松。 我在公司建立了一个标准,所有表都有以下列:
SUSER_SNAME()
))行 ID 在每个表上都有一个唯一的键,并且在任何情况下都是按行自动生成的(并且权限阻止任何人对其进行编辑),并且合理地保证在所有表和数据库中都是唯一的。 如果任何 ORM 系统需要单个 ID 密钥,则可以使用此密钥。
同时,如果可能的话,实际的PK是自然密钥。 我的内部规则是这样的:
EventId、AttendeeId
)因此,理想情况下,您最终会得到一个自然的、人类可读且令人难忘的 PK,以及一个 ORM 友好的单 ID-per-表 GUID。
警告:我维护的数据库倾向于数十万条记录,而不是数百万或数十亿条记录,因此,如果您有大型系统的经验,而这与我的建议相矛盾,请随意忽略我!
I'll be up-front about my preference for natural keys - use them where possible, as they'll make your life of database administration a lot easier. I established a standard in our company that all tables have the following columns:
SUSER_SNAME()
in T-SQL))Row ID has a unique key on it per table, and in any case is auto-generated per row (and permissions prevent anyone editing it), and is reasonably guaranteed to be unique across all tables and databases. If any ORM systems need a single ID key, this is the one to use.
Meanwhile, the actual PK is, if possible, a natural key. My internal rules are something like:
EventId, AttendeeId
)So ideally you end up with a natural, human-readable and memorable PK, and an ORM-friendly one-ID-per-table GUID.
Caveat: the databases I maintain tend to the 100,000s of records rather than millions or billions, so if you have experience of larger systems which contraindicates my advice, feel free to ignore me!
我遵循一些规则:
关于代理与自然键,我参考了上面的规则。 如果自然键很小并且永远不会改变,则可以将其用作主键。 如果自然键很大或者可能会改变,我会使用代理键。 如果没有主键,我仍然会创建一个代理键,因为经验表明您总是会向架构中添加表,并希望将主键放在适当的位置。
I follow a few rules:
On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.
自然与人工密钥是数据库社区中的一种宗教辩论 - 请参阅 这篇文章 及其链接到的其他文章。 我既不赞成总是拥有人造钥匙,也不赞成从不拥有它们。 我会根据具体情况决定,例如:
无论在哪里使用人工密钥,您都应该始终还声明对自然键的唯一约束。 例如,如果必须的话,可以使用 state_id,但是你最好对 state_code 声明一个唯一约束,否则你最终肯定会得到:
Natural verses artifical keys is a kind of religious debate among the database community - see this article and others it links to. I'm neither in favour of always having artifical keys, nor of never having them. I would decide on a case-by-case basis, for example:
Wherever artificial keys are used, you should always also declare unique constraints on the natural keys. For example, use state_id if you must, but then you'd better declare a unique constraint on state_code, otherwise you are sure to eventually end up with:
我避免使用自然键的原因很简单——人为错误。 尽管自然唯一标识符通常可用(SSN、VIN、帐号等),但它们需要人工正确输入。 如果您使用 SSN 作为主键,有人在数据输入过程中调换了几个数字,并且没有立即发现错误,那么您将面临更改主键的情况。
我的主键都是由数据库程序在后台处理的,用户永远不会意识到它们。
I avoid using natural keys for one simple reason -- human error. Although natural unique identifiers are often available (SSN, VIN, Account Number, etc.), they require a human to enter them correctly. If you're using SSNs as a primary key, someone transposes a couple of numbers during data entry, and the error isn't discovered immediately, then you're faced with changing your primary key.
My primary keys are all handled by the database program in the background and the user is never aware of them.
只是对经常被忽视的事情进行额外的评论。 有时,不使用单个代理键作为主键对子表有好处。 假设我们有一种设计,允许您在一个数据库中运行多个公司(可能是托管解决方案,或者其他什么)。
假设我们有这些表和列:
如果最后一位没有意义,
Invoice.CompanyId
是两个外键的一部分,一个是 CostCentre 表,另一个是 CostCentre 表。一个到 CostElement 表。 主键是 (InvoiceId, CompanyId)。在此模型中,不可能搞砸并引用一家公司的 CostElement 和另一家公司的 CostCentre。 如果在 CostElement 和 CostCentre 表 上使用单个代理键作为主键,并且在 Invoice 表 中没有外键关系,则它将是。
搞砸的机会越少越好。
Just an extra comment on something that is often overlooked. Sometimes not using a single surrogate key as primary has benefits in the child tables. Let's say we have a design that allows you to run multiple companies within the one database (maybe it's a hosted solution, or whatever).
Let's say we have these tables and columns:
In case that last bit doesn't make sense,
Invoice.CompanyId
is part of two foreign keys, one to the CostCentre table and one to the CostElement table. The primary key is (InvoiceId, CompanyId).In this model, it's not possible to screw-up and reference a CostElement from one company and a CostCentre from another company. If a single surrogate key was used as primary on the CostElement and CostCentre tables, and without the foreign key relations in the Invoice table, it would be.
The fewer chances to screw up, the better.