在我的团队中进行了一场相当热烈的讨论,我被迫思考大多数人喜欢什么作为主键。 我们有以下组 -
- Int/ BigInt,其中自动增量是足够好的主键。
- 主键应至少由 3 列组成。
- Id、GUID 和人类可读的行标识符都应该区别对待。
PK的最佳方法是什么? 如果你能证明你的观点是正确的,那就太好了。 有比上面更好的方法吗?
编辑:任何人都有一个简单的示例/算法来为可扩展的行生成人类可读的标识符?
In a fairly animated discussion in my team I was made to think what most people like as primary keys. We had the following groups-
- Int/ BigInt which autoincrement are good enough primary keys.
- There should be at least 3 columns that make up the primary key.
- Id, GUID and human readable row identifiers all should be treated differently.
What's the best approach for PKs? It would be awesome if you could justify your opinion. Is there a better approach that the above?
EDIT: Anyone has a simple sample/algorithm to generate human readable identifiers for rows that scales well?
发布评论
评论(26)
只是稍微相关,但是当我有小型分类表(本质上是那些在代码中表示 ENUM 的表)时,我最近开始做的一件事是,我将主键设置为 char(3) 或 char(4)。 然后我让这些主键代表查找值。
例如,我有一个针对内部销售代理的报价系统。 我们有“成本类别”,每个报价行项目都被分配其中之一...所以我有一个名为“tCostCategories”的类型查找表,其中主键是“MTL”、“SVC”、“TRV”、“TAX”, 'ODC'。 查找表中的其他列存储更多详细信息,例如代码的正常英语含义、“材料”、“服务”、“旅行”、“税费”、“其他直接成本”等等。
这真的很好,因为它不使用比 int 更多的空间,并且当您查看源数据时,您不必链接查找表来知道该值到底是什么。 例如,报价行可能如下所示:
1 PartNumber $40 MTL
2 其他零件编号 $29.99 SVC
3 PartNumber2 $150 TRV
使用 int 来表示类别,然后在所有行上链接 1、2、3 要容易得多 - 数据就在您面前,而且性能似乎不受影响根本没有(我没有真正测试过。)
就真正的问题而言......我喜欢 RowGUID uniqueidentifiers。 我不是 100% 同意这一点,但是不是所有行都有内部 RowGuid 吗? 如果是这样,那么使用 RowGuid 实际上会比整数(或其他任何与此相关的东西)占用更少的空间。我所知道的是,如果它足以让 M$ 在 GreatPlains 中使用,那么它对我来说就足够了。 (我应该躲起来吗??)
Only slightly relevant, but one thing I've started doing recently when I have small classification tables (essentially those that would represent ENUMs in code) is that I'll make the primary key a char(3) or char(4). Then I make those primary keys representative of the lookup value.
For example, I have a quoting system for our internal Sales Agents. We have "Cost Categories" that every quote line item is assigned one of... So I have a type lookup table called 'tCostCategories', where primary key is 'MTL', 'SVC', 'TRV', 'TAX', 'ODC'. Other columns in the lookup table store more details, such as the normal english meanings of the codes, "Material", "Service", "Travel", "Taxes", "Other Direct Costs", and so forth.
This is really nice because it doesn't use any more space than an int, and when you are looking at the source data, you don't have to link the lookup table to know what the heck the value is. For example, a quote row might look like:
1 PartNumber $40 MTL
2 OtherPartNumber $29.99 SVC
3 PartNumber2 $150 TRV
It's much easier that using an int to represent the categories and then linking 1, 2, 3 on all the lines - you have the data right there in front of you, and the performance doesn't seem affected at all (not that I've truly tested.)
As far as the real question goes... I like RowGUID uniqueidentifiers. I'm not 100% on this, but don't all rows have internal RowGuid's anyway?? If so, then using the RowGuid would actually take less space than ints (or anything else for that matter.) All I know is that if it's good enough for M$ to use in GreatPlains then it's good enough for me. (Should I duck??)
哦,我使用 GUID 的另一个原因是 - 我使用分层数据结构。 也就是说,我有一个主键匹配的表“公司”和表“供应商”。 但我还有一个表“制造商”,它也“继承”自公司。 供应商和制造商常用的字段不会出现在这些表中 - 它们出现在公司中。 在这种设置中,使用 int 比 Guid 更痛苦。 至少,您不能使用身份主键。
Oh one more reason I use GUIDs - I use a hierarchical data structure. That is, I have a table 'Company' and a table 'Vendor' for which the Primary Keys match up. But I also have a table 'Manufacturer' that also 'inherits' from Company. The fields that are common to Vendors and Manufacturers don't appear in those tables - they appear in Company. In this setup, using int's is much more painful than Guids. In the very least, you can't use identity primary keys.
我喜欢自然钥匙,只要我可以信任它们。 我愿意支付少量的性能价格,以便使用对主题专家有意义的密钥。
对于描述实体的表,应该有一个简单的自然键,可以像主题人员那样识别各个实例。 如果主题没有实体之一的可信标识符,那么我将诉诸代理密钥。
对于描述关系的表,我使用复合键,其中每个组件引用参与关系的实体,因此引用实体表中的一行。 同样,使用复合键对性能的影响通常很小。
正如其他人指出的那样,“主键”一词有点误导。 在关系数据模型中,使用的术语是“候选键”。 单个表可能有多个候选键。 从逻辑上讲,每个人都与另一个人一样好。 选择其中一个作为“主要”并通过该键进行所有引用只是设计者可以做出的选择。
I like natural keys, whenever I can trust them. I'm willing to pay a small performance price price in order to use keys that make sense to the subject matter experts.
For tables that describe entities, there should be a simple natural key that identifies individual instances the same way the subject matter people do. If the subject matter does not have trustworthy identifiers for one of the entities, then I'll resort to a surrogate key.
For tables that describe relationships, I use a compound key, where each component references an entity that participates in the relationship, and therefore a row in an entity table. Again, the performance hit for using a compound key is generally minimal.
As others have pointed out, the term "primary key" is a little misleading. In the Relational Data Model, the term that's used is "candidate keys". There could be several candidate keys for a single table. Logically, each one is just as good as another. Choosing one of them as "primary" and making all references via that key is simply a choice the designer can make.
指导.期间。
如果您需要横向扩展或需要通过其他方式分配主键,他们将是您的朋友。 您可以为其他所有内容添加索引。
更新以澄清我的声明。
我曾在很多不同类型的网站上工作过。 从小型单服务器交易到由多个数据库和 Web 服务器支持的大型交易。 当然,有些应用程序可以使用自动递增整数作为主键。 然而,这些不符合我做事的模式。
使用 GUID 时,您可以在任何地方生成 ID。 它可以由远程服务器、您的 Web 应用程序、数据库本身生成,甚至可以在多主机情况下的多个数据库中生成。
另一方面,自动递增的 INT 只能在主数据库内安全地生成。 再次强调,如果您有一个应用程序与该支持数据库服务器密切相关并且横向扩展不是您所关心的事情,那么这可能没问题。
当然,使用 GUID 意味着您必须每晚进行重新索引过程。 但是,如果您使用除自动递增 INT 之外的任何其他内容,您无论如何都应该这样做。 哎呀,即使使用 INT 作为主索引,您也可能有其他需要重新生成的索引来处理碎片。 因此,使用 GUID 并不会增加另一个问题,因为无论如何都需要执行这些任务。
如果您查看较大的应用程序,您会注意到一些重要的事情:它们都使用 Base64 编码的 GUID 作为密钥。 原因很简单,使用 GUID 可以让您轻松横向扩展,而尝试横向扩展 INT 时可能会遇到很多麻烦。
我们最新的应用程序经历了一段持续大约一个月的大量插入期。 之后,90% 以上的查询都会选择用于报告。 为了增加容量,我可以在这个大插入期间启动额外的数据库服务器; 然后轻松地将它们合并到单个数据库中以进行报告。 尝试用 INT 来做到这一点绝对是一场噩梦。
坦率地说,任何时候您对数据库进行集群或设置复制时,数据库服务器都会要求您在表上拥有 GUID。 因此,如果您认为您的系统可能需要扩展,那么请选择合适的系统。
Guids.period.
In the event that you need to scale out or you need to assign the primary key by alternate means they will be your friend. You can add indexes for everything else.
update to clarify my statement.
I've worked on a lot of different kinds of sites. From small single server deals to large ones backed with multiple DB and web servers. There have certainly been apps that would have been just fine with auto incrementing ints as primary keys. However, those don't fit the model of how I do things.
When using a GUID you can generate the ID anywhere. It could be generated by a remote server, your web app, within the database itself or even within multiple databases in a multimaster situation.
On the other hand, an auto incremented INT can only be safely generated within the primary database. Again, this might be okay if you have an application that will be intimately tied to that one backing DB server and scaling out is not something you are concerned with.
Sure, usage of GUIDs mean you have to have nightly reindexing processes. However, if you are using anything other than an auto incremented INT you should do that anyway. Heck, even with an INT as the primary it's likely you have other indexes that need regenerated to deal with fragmentation. Therefore, using GUIDs doesn't exactly add another problem because those tasks need to be performed regardless.
If you take a look at the larger apps out there you will notice something important: they all use Base64 encoded GUIDs as the keys. The reason for this is simple, usage of GUIDs enables you to scale out easily whereas there can be a lot of hoops to jump through when attempting to scale out INTs.
Our latest app goes through a period of heavy inserts that lasts for about a month. After that 90+% of the queries are all selects for reporting. To increase capacity I can bring up additional DB servers during this large insert period; and later easily merge those into a single DB for reporting. Attempting to do that with INTs would be an absolute nightmare.
Quite frankly, any time you cluster a database or setup replication the DB server is going to demand that you have GUIDs on the table anyway. So, if you think that your system might need to grow then pick the one that's good.
无论您是否意识到,这是一个复杂的主题。 可能属于此 StackOverflow 常见问题解答部分。
我不应该在这里问什么样的问题?
避免提出主观的、争论性的或需要长时间讨论的问题。 这是一个可以解答问题的地方!
这个问题已经争论了很多年,而且还将继续争论很多年。 我看到的唯一共识提示是,答案在某种程度上是可预测的,具体取决于您是否询问 OO 人员(GUID 是唯一的出路!)、数据建模者(自然键是唯一的出路!)、或以性能为导向的 DBA(INT 是唯一的出路!)。
This is a complex subject whether you realized it or not. Might fall under the section on this StackOverflow FAQ.
What kind of questions should I not ask here?
Avoid asking questions that are subjective, argumentative, or require extended discussion. This is a place for questions that can be answered!
This has been debated for years and will continue to be debated for years. The only hints of consensus I have seen is that the answers are somewhat predictable depending on if you are asking a OO guy (GUIDs are the only way to go!), a data modeler (Natural keys are the only way to go!), or a performance oriented DBA (INTs are the only way to go!).
如果您要在数据库与偶尔连接的应用程序之间进行任何同步,那么您应该使用 GUID 作为主键。 这对于调试来说是一种痛苦,所以除了这种情况之外,我倾向于坚持使用自动增量的整数。
自动增量整数应该是您的默认值,不使用它们应该是合理的。
If you're going to be doing any syncing between databases with occasionally connected apps, then you should be using GUIDs for your primary keys. It is kind of a pain for debugging, so apart from that case I tend to stick to ints that autoincrement.
Autoincrement ints should be your default, and not using them should be justified.
我没有看到一个答案指出(我认为的)真正的基本点 - 即主键可以保证您不会在表中获得同一现实世界实体的两个条目(如在数据库中建模)。 此观察有助于确定主键的什么是好的选择,什么是坏的选择。
例如,在(美国)州名称和代码表中,名称或代码都可以是主键 - 它们构成两个不同的候选键,并且选择其中一个(通常是较短的 - 代码)作为主键首要的关键。 在函数依赖理论(以及连接依赖 - 1NF 到 5NF)中,候选键比主键更重要。
举个反例,人名通常会做出错误的主键选择。有很多人他们的名字是“约翰·史密斯”或其他类似的名字;甚至考虑到中间名(记住:不是每个人都有中间名 - 例如,我没有),因此,人们有很大的重复空间。他们发明了诸如社会安全号码(SSN)或员工号码之类的人工密钥,并使用它们来指定个人。
理想的主密钥是简短的、唯一的、易于记忆的和自然的。是强制性的;考虑到现实世界数据的限制,其余的必须灵活处理
,因此,在确定给定表的主键时,您必须查看该表中的一组或多组列值。 table 唯一标识表中的每一行? 这些是候选键。 现在,如果每个候选键由 4 或 5 列组成,那么您可能会认为这些太笨拙而无法成为好的主键(主要是因为简短)。 在这种情况下,您可能会引入代理键 - 人工生成的数字。 很多时候(但并非总是)一个简单的 32 位整数就足以用作代理键。 然后,您将此代理键指定为主键。
但是,您仍然必须确保其他候选键(代理键也是候选键,以及所选的主键)都保留为唯一标识符 - 通常通过放置唯一约束在这些列组上。
有时,人们发现很难确定是什么使一行独一无二,但应该有一些东西可以做到这一点,因为简单地重复一条信息并不会让它变得更加真实。 如果您不小心,确实获得了两行(或更多行)声称存储相同的信息,然后您需要更新信息,则存在只更新一行的危险(特别是如果您使用游标)而不是每一行,因此行不同步,没有人知道哪一行包含正确的信息。
从某些方面来看,这是一个相当强硬的观点。
在需要时使用 GUID 并没有什么特别的问题,但它们往往大(如 16-64 字节),而且使用得太频繁。 很多时候,一个完美的 4 字节值就足够了。 使用 4 字节值就足够的 GUID 会浪费磁盘空间,甚至会减慢对数据的索引访问,因为每个索引页的值较少,因此索引会更深,并且必须读取更多页才能访问该数据。信息。
I don't see an answer which points out (what I regard as) the really fundamental point - namely, that a primary key is what guarantees that you won't get two entries in the table for the same real-world entity (as modelled in the database). This observation helps establish what are good and what are bad choices for primary key.
For example, in a table of (US) state names and codes, either the name or the code could be the primary key - they constitute two different candidate keys, and one of them (normally the shorter - the code) is chosen as the primary key. In the theory of functional dependencies (and join dependencies - 1NF through 5NF - it is the candidate keys that are crucial rather than a primary key.
For a counter-example, human names generally make a bad choice for primary key. There are many people who go by the name "John Smith" or some other similar names; even taking middle names into account (remember: not everyone has one - for example, I don't), there is plenty of scope for duplication. Consequently, people do not use names as primary keys. They invent artificial keys such as the Social Security Number (SSN) or Employee Number and use them to designate the individual.
An ideal primary key is short, unique, memorable, and natural. Of these characteristics, uniqueness is mandatory; the rest have to flex given the constraints of real world data.
When it comes to determining the primary key of a given table, therefore, you have to look at what that table represents. What set or sets of column values in the table uniquely identifies each row in the table? Those are the candidate keys. Now, if each candidate key consists of 4 or 5 columns, then you might decide that those are too clumsy to make a good primary key (primarily on grounds of shortness). In those circumstances, you might introduce a surrogate key - an artificially generated number. Very often (but not always) a simple 32-bit integer is sufficient for the surrogate key. You then designate this surrogate key as the primary key.
However, you must still ensure that the other candidate keys (for the surrogate key is a candidate key too, as well as the chosen primary key) are all maintained as unique identifier - normally by placing a unique constraint on those sets of columns.
Sometimes, people find it difficult to identify what makes a row unique, but there should be something to do that, because simply repeating a piece of information doesn't make it any more true. And if you're not careful and do get two (or more) rows purporting to store the same information, and you then need to update the information, there is a danger (especially if you use cursors) that you will update just one row rather than every row, so the rows are out of synchrony and no-one knows which row contains the correct information.
This is a pretty hard-line view, in some respects.
I've no particular problem with using a GUID when they are needed, but they tend to be big (as in 16-64 bytes), and they are used too often. Very often a perfectly good 4-byte value would suffice. Using a GUID where a 4-byte value would suffice wastes disk space, and slows up even indexed access to the data since there are fewer values per index page, so the index will be deeper and more pages have to be read to get to the information.
这只是一个宗教问题,因为人们寻求普遍的正确答案。 事实上,您的团队和这个 SO 线程都表现出如此多的分歧,这应该是一个线索,表明在不同情况下有充分的理由使用您描述的所有解决方案。
state
(CA、TX、NY)时,您不妨使用char(2)
自然键而不是int。id
”代理键(在多对多表中尤其如此)。 在每个表中强制使用三列键绝对是无稽之谈。This is only a religious issue because people seek a universal right answer. The fact that both your team and this SO thread shows so much disagreement should be a clue that there are good reasons to use all the solutions you describe, in different circumstances.
state
(CA, TX, NY) you might as well use achar(2)
natural key instead of an int.id
" surrogate key unnecessarily when a perfectly good compound key exists (this is especially true in many-to-many tables). A mandate for a three-column key in every table is absolute nonsense.我喜欢数据库程序员博客作为来源对于此类信息。
主键有 3 列? 我想说,列应该根据业务规则的要求具有适当的唯一约束,但我仍然有一个单独的代理键。 复合键意味着业务逻辑进入到键中。 如果逻辑发生变化,你的整个模式就会被搞砸。
I like The Database Programmer blog as a source for this kind of info.
3 columns for a primary key? I would say that columns should have appropriate unique constraints as the business rules demand, but I'd still have a separate surrogate key. Compound keys mean business logic enters into the key. If the logic changes, your whole schema is screwed.
我喜欢我的独特。
I like mine unique.
有点偏离主题,但我觉得有必要插话...
如果您的主键是 GUID,不要将其设为聚集索引。 由于 GUID 是非连续的,因此几乎每次插入时数据都会在磁盘上重新排列。 (恶心。)如果使用 GUID 作为主键,它们应该是非聚集索引。
Slightly off-topic, but I feel compelled to chime in with...
If your primary key is a GUID, do not make it a clustered index. Since GUIDs are non-sequential, the data will be re-arranged on disk during almost every insert. (Yuck.) If using GUIDs as primary keys, they should be nonclustered indexes.
我认为在“主”键短语中使用“主”一词实际上是一种误导。
首先,使用“键”的定义,即“键”是表中必须唯一的属性或属性集,
然后,拥有任何键都可以满足多个通常相互不一致的目的。
用作子表中与该父表有关系的一条或多条记录的连接条件。 (在这些子表中显式或隐式定义外键)
(相关)确保子记录必须在父表中具有父记录(子表FK必须作为父表中的Key存在)
提高需要快速定位子表中特定记录/行的查询的性能表。
通过防止将表示同一逻辑实体的重复行插入到表中来确保数据一致性。 (这通常称为“自然”键,应由相对不变的表(实体)属性组成。)
显然,任何无意义的、非自然的键(例如 GUID 或自动生成的整数)是完全无法使用的。 #4。
但通常,对于许多(大多数)表,可以提供#4的完全自然的键通常由多个属性组成,并且太宽,或者太宽以至于将其用于目的#1、#2或 #3 会导致不可接受的性能后果。
答案很简单,对其他子表中的所有连接和 FK 使用简单的自动生成键,但要确保每个需要数据一致性的表(很少有表不需要)。 )有一个备用的自然唯一键,可以防止插入不一致的数据行...另外,如果您总是同时拥有这两个键,那么所有反对使用自然键的反对意见(如果它发生变化怎么办?我必须更改它引用的每个地方作为 FK)变得毫无意义,因为您没有为此使用它...您仅在它是 PK 的一个表中使用它,以避免不一致的重复数据...
至于 GUID,请非常小心地使用它们,因为在索引中使用 GUID 可以减少索引碎片。 用于创建它们的最常见算法将 GUID 的“随机”部分放在最高有效位位置...这增加了添加新行时对常规索引碎片整理/重新索引的要求。
I think the use of the word "Primary", in the phrase "Primary" Key is in a real sense, misleading.
First, use the definition that a "key" is an attribute or set of attributes that must be unique within the table,
Then, having any key serves several often mutually inconsistent purposes.
To use as join conditions to one or many records in child tables which have a relationship to this parent table. (Explicitly or implicitly defining a Foreign Key in those child tables)
(related) Ensuring that child records must have a parent record in the parent table (The child table FK must exist as Key in the parent table)
To increase performance of queries that need to rapidly locate a specific record/row in the table.
To ensure data consistency by preventing duplicate rows which represent the same logical entity from being inserted into the table. (This is often called a "natural" key, and should consist of table (entity) attributes which are relatively invariant.)
Clearly, any non-meaningful, non-natural key (like a GUID or an auto-generated integer is totally incapable of satisfying #4.
But often, with many (most) tables, a totally natural key which can provide #4 will often consist of multiple attributes and be excessively wide, or so wide that using it for purposes #1, #2, or #3 will cause unacceptable performance consequences.
The answer is simple. Use both. Use a simple auto-Generating integral key for all Joins and FKs in other child tables, but ensure that every table that requires data consistency (very few tables don't) have an alternate natural unique key that will prevent inserts of inconsistent data rows... Plus, if you always have both, then all the objections against using a natural key (what if it changes? I have to change every place it is referenced as a FK) become moot, as you are not using it for that... You are only using it in the one table where it is a PK, to avoid inconsistent duplicate data...
As to GUIDs, be very careful using them, as using GUIDs in an index can hose index fragmentation. The most common algorithms used to create them puts the "random" portion of the GUID in the most significant bit positions... This increases the requirement for regular index defragmentation / Reindexing as new rows are added.
我总是使用代理键。 代理键(通常是标识列、自动增量或 GUID)是一种键不存在于数据本身中的键。 另一方面,自然键本身可以唯一标识行。 据我所知,在生活中几乎没有真正的自然键。 连美国的SSN这样的东西都不是天然钥匙。 复合主键是一场即将发生的灾难。 您无法编辑任何数据(这是任何自然键(无论是否是复合键)的主要缺点,但更糟糕的是,使用复合键,现在您必须将该键数据永久保存到每个相关表中。 多么巨大的浪费啊。
现在,为了选择代理键,我坚持使用标识列(我主要在 MS SQL Server 中工作)。 GUID 太大,Microsoft 建议不要使用它们作为 PK。 如果您有多个服务器,您所需要做的就是将增量设置为 10 或 20 或您认为需要同步/扩展的最大服务器数量,并且只需为每个后续服务器上的每个表添加种子即可,并且您永远不会发生数据冲突。
当然,由于增量,我将标识列设置为 BigInt(也称为 long [64 位])。
做一些数学计算,即使增量为 100,表中仍然可以有 92,233,720,368,547,758(> 92 万亿)行。
I always go with the surrogate key. A surrogate key (usually an identity column, autoincrement, or GUID) is one in which the key is not present in the data itself. A natural key, on the other hand, is one that, on its own, uniquely identifies the row. As near as I can tell in life, there are hardly any real natural keys. Not even things like SSN in the United States is a natural key. Composite primary keys are a disaster waiting to happen. You can't edit any of that data (which is the major drawback of any natural key, composite or not), but worse is that with a composite key, now you have to perpetuate that key data into every related table. What a giant waste.
Now, for selection of the surrogate key, I stick with identity columns (I work mostly in MS SQL Server). GUID's are too large and Microsoft recommends against using them as a PK. If you have multiple servers, all you need to do is make the increment 10 or 20 or whatever you think the maximum number of servers you'll ever need to sync/expand to, and just inc the seed for each table on each subsequent server, and you'll never have a data collision.
Of course, because of the increment, I make the identity column a BigInt (otherwise known as a long [64 bits]).
Doing a bit of math, even if you make the increment 100, you can still have 92,233,720,368,547,758 (> 92 quadrillion) rows in your table.
您绝对不应该做的一件事是使用智能钥匙。 这是一把钥匙,有关记录的信息被编码在钥匙本身中,它最终会咬住你。
我在一个地方工作过,那里的主键是账户 ID,它是字母和数字的组合。 我不记得任何具体细节,但是,例如,那些特定类型的帐户将在 600 范围内,而另一种类型的帐户则从 400 开始。那太好了,直到该客户决定同时要求这两个帐户工作类型。 或者改变他们所做的工作类型。
另一个地方,使用树中的位置作为记录的主键。 那么就会有如下的记录。
当然,客户首先想要的是一种移动树中物品的方法。 在那之前整套软件就死掉了。
拜托,拜托,拜托,如果您正在编写我必须维护的代码,请不要使用智能密钥!
One thing you should never do is use a smart key. That is a key where information about the record is coded in the key itself, and it will eventually bite you.
I worked one place, where the primary key was the account ID, which was a combination of letters and numbers. I don't remember any specifics, but, for example, those accounts that were of a certain type, would be in the 600 range, and of another type, started with 400. That was great, until that customer decided to ask for both types of work. Or changed the type of work they did.
Another place, used the location in the tree as the primary key for records. So there would be records like the following.
Of course, the first thing the customers wanted was a way to move items in the tree around. The entire set of software died before that happened.
Please, please, please, if you're writing code that I ever have to maintain, please don't use a smart key!
我喜欢将自动增量作为主键。 我内心深处知道这是一种逃避,但它确实使按添加时间排序数据变得如此容易(ORDER BY ID DESC,例如)。
3 列对于人类的解析来说听起来非常刺耳。
这就是权衡——您需要多少关系功能,而不是让查询它的人可以理解这个表(相对于存储过程或编程接口)。
自动增量是为我们人类设计的。 :-(
I'm a fan of the auto-increment as primary key. I know deep in my heart that this is a cop-out, but it does make it so easy to sort data by when it was added (ORDER BY ID DESC, f'r instance).
3 columns sounds awfully harsh to humanly parse.
And that's the trade-off -- how much of the relational capability do you need, versus making THIS TABLE RIGHT HERE understandable to a human interrogating it (versus the stored-procedure or programmatic interface).
auto-increment is for us humans. :-(
一般来说,这取决于。
就我个人而言,我喜欢自动增量整数。
但是,我可以告诉您的一件事是,永远不要相信其他来源的数据作为您的密钥。 我发誓,每次我这样做,它都会回来咬我。 好吧,再也不会了!
Generally, it depends.
Personally, I like autoincrement ints.
But, one thing I can tell you is to never trust data from other sources as your key. I swear, every time I've done that it comes back to bite me. Well, never again!
我不明白这一点。
您是在谈论“自然键”,例如“姓名和出生日期”吗? 自然键如果存在的话可能是理想的,但是自然键的大多数候选者要么不是唯一的(几个人同名),要么不是恒定的(有人可以更改他们的名字)。
我更喜欢向导。 自动增量的一个潜在问题是,值(例如“订单 id”)是由数据库实例(例如“销售数据库”)分配的……这不会完全起作用(相反,您开始需要复合键),如果您曾经需要合并由多个数据库实例创建的数据(例如,来自多个销售办事处,每个销售办事处都有自己的数据库)。
I don't understand this.
Are you talking about a "natural key", e.g. "name and date of birth"? A natural key might be ideal if it exists, but most candidates for a natural key are either not unique (several people with the same name) or not constant (someone can change their name).
I prefer Guid. A potential problem with autoincrement is that the value (e.g. "order id") is assigned by the database instance (e.g. by the "sales database") ... which won't entirely work (instead you start to need compound keys) if you ever need to merge data created by more than one database instance (e.g. from several sales offices each with their own database).
RE GUID
留意这是否会是一个非常非常大的数据库、大量的负载和快速的访问。
在我上一份工作中,我们拥有包含 100 到 5 亿条记录的数据库,我们的数据库人员强烈反对使用 GUID,并提倡使用适当大小的十进制数。 他们认为(在 Oracle 下)字符串 Guid 与十进制值的内部存储大小差异会在查找中产生非常明显的差异。 (更大的键 = 要遍历的更深的树)
GUID 的随机性还显着降低了索引页的填充因子 - 这极大地增加了撕裂和磁盘 I/O。
RE GUID's
Watch out if this is going to be a really Really REALLY REALLY big database, lots of load, and fast access.
At my last job, where we had databases of 100 to 500 million records, our database guys strongly argued against GUIDs, and for an appropriately sized decimal number. They felt that (under Oracle) the size difference in the internal storage for a string Guid - vs- a decimal value would make a very noticeable difference in lookups. ( Bigger keys = deeper trees to traverse)
The random nature of GUIDs also reduces the fill-factor for index pages significantly - this dramatically increases tearing and disk I/O.
这是一个经典的“视情况而定”。 每个项目都没有唯一正确的答案。 我在不同的情况下喜欢不同的东西。 这取决于我是否使用 ORM 以及它支持什么。 这取决于整体架构(分布式或非分布式等)。 只需选择一个您认为可行的方法,然后继续争论制表符和空格即可。
This is a classic "it depends". There's no one right answer for every project. I like different things for different situations. It depends on whether I'm using an ORM and what it supports. It depends on the overall architecture (distributed or not, etc). Just pick one that you think will work and move on to arguing over tabs and spaces.
我倾向于使用选项 #1 或 #3,具体取决于大小、连接人数以及是否是多个数据库服务器的情况。
选项#2 对我来说没有多大意义。 如果这三者中的任何一个都不足以识别唯一记录,那么有可能(无需经过额外的手段)两个记录在所有三列中显示出相同的值。 如果您想对三者的任意组合强制唯一性,只需为它们添加索引即可。
I tend to use option #1 or #3 depending on the size, the number of people connecting, and whether it is a multiple database server situation or not.
Option #2 doesn't make much sense to me. If any one of the three is not enough to identify a unique record, then it's possible (without going through extra machinations) two have two records show up with the same values in all three columns. If you want to enforce uniqueness on any combination of the three, then just add an index for them.
自动递增列。 我能够使我的代码与 SQL Server 或 Oracle 无缝协作,其中一个使用身份,另一个通过我的 DAL 使用序列,这让我高兴极了。 我同意,如果您正在进行复制或发送数据以便稍后在处理后接收它,GUID 有时是必要的。
Auto increment columns. I am able to make my code work seamlessly with SQL Server or Oracle, one using identity the other using sequences through my DAL, and I couldn't be happier. I agree, GUIDs sometimes are necessary if you are doing replication or sending data away to receive it later on afer processing.
我一直使用代理键 - 一个称为“id”的自动递增整数。 即使另一个选项是显而易见的,我也可以看到很多这样做的理由:
......并且没有明智的理由不这样做:
总是欢迎我没有想到或遇到过的合理理由反对......
I've always used a surrogate key - an autoincrementing integer called 'id'. I can see plenty of reasons to do this even when another option is obvious:
...and no sensible reason not to:
sensible reasons against that I haven't thought of or come across yet are always welcomed...
我只使用了自动增量 int 或 GUID。 99% 的时间我都使用 auto-increment int。 这正是我第一次学习数据库时被教导要使用的东西,并且从未遇到过不使用它们的理由(尽管我知道 GUID 会更好的原因)。
我喜欢自动递增整数,因为它有助于提高可读性。 例如,我可以说“看一下记录 129383”,某人很容易进去找到它。 对于 GUID,这几乎是不可能做到的。
I've only use an auto-increment int or a GUID. 99% of the time I've use auto-increment int. It's just what I was taught to use when I first learned about databases and have never run into a reason not to use them (although I know of reasons why a GUID would be better).
I like auto increment ints because it helps with readability. For example I can say "take a look at record 129383" and it's pretty easy for someone to go in and find it. With a GUID that's nearly impossible to do.
除了基本的定义答案之外,什么构成一个好主键很大程度上取决于宗教和休息室争论。 如果您有一些内容并且始终会唯一地映射到单个行,那么它将作为主键正常工作。 过了这一点,还有其他考虑因素:
最后一个可能是吸引大多数人使用 GUID 或自增整数列之类的东西的原因,因为依赖地址、电话号码、名字/姓氏等东西是不够的。 我能想到的关于人的唯一不变的就是 SSN,但我什至不能 100% 确定那些永远保持独特。
希望这有助于增加一些清晰度......
Past a basic definitional answer, what constitutes a good primary key is left largely to religion and break room arguments. If you have something that is, and will always, map uniquely to an individual row, then it will work fine as a primary key. Past that point, there are other considerations:
This last one is likely what draws most people to use things like GUIDs or self-incrementing integer columns, because relying on things like addresses, phone numbers, first/last names, etc, just don't cut it. The only invariant about people I can think of is SSNs, but then I'm not even 100% certain about those remaining forever unique.
Hopefully this helps add some clarity...
我处理主键的方法(我认为是最好的)是避免使用“默认”方法。 这意味着我不再只是简单地使用一个自动递增的整数并结束它,我会查看问题并说“是否有一列或一组列始终是唯一的并且不会改变?” 如果答案是肯定的,那么我就采取这种方法。
The way I approach primary keys (and I feel is the best) is to avoid having a "default" approach. This means instead of just slapping on an auto-incrementing integer and calling it a day I look at the problem and say "is there a column or group of columns that will always be unqiue and won't change?" If the answer is yes then I take that approach.
几乎总是整数。
除了尺寸更小/处理速度更快之外,它们还有其他充分的理由。 您更愿意写下哪一个 - “404040”或“3463b5a2-a02b-4fd4-aa0f-1d3c0450026c”?
Almost always integers.
They have other good reasons besides being smaller/faster to process. Which would you rather write down - "404040" or "3463b5a2-a02b-4fd4-aa0f-1d3c0450026c"?