SQL Server 事务表是否应该始终具有代理主键
对于已经有主键(4列的自然组合键)的大型事务表(1亿行,20GB),添加标识列并将其作为主键是否有助于性能?
当前主键(4 列的自然复合主键)可以完成这项工作,但我被告知您应该始终有一个代理键。 那么,可以通过创建标识列并将其设为主键来提高性能吗?
我正在使用 SQL Server 2008 R2 数据库。
编辑:此事务表主要连接到定义表并用于填充报告。
编辑:如果我确实添加了代理键,它就不会在任何连接中使用。将使用现有的关键字段。
编辑:该表不会有子表
For a large table of transactions (100 million rows, 20 GB) that already has a primary key (a natural composite key of 4 columns), will it help performance to add an identity column and make that the primary key?
The current primary key (the natural composite primary key of 4 columns) does the job, but I have been told that you should always have a surrogate key. So, could improve performance by creating an identity column and making that the primary key?
I'm using SQL Server 2008 R2 database.
EDIT: This transaction table is mainly joined to definition tables and used to populate reports.
EDIT: If I did add a surrogate key, it wouldn't be used in any joins. The existing key fields would be used.
EDIT: There would be no child tables to this table
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
仅添加 IDENTITY 列并为其添加新的约束和索引不太可能提高性能。该表会更大,因此扫描和查找可能需要更长的时间。还将有更多索引需要更新。当然,这完全取决于您要衡量的性能...以及您在添加新列时是否打算对代码或数据库进行其他更改。添加 IDENTITY 列而不执行其他操作可能是不明智的。
Just adding an IDENTITY column and adding a new constraint and index for it is very unlikely to improve performance. The table will be larger and therefore scans and seeks could take longer. There will also be more indexes to update. Of course it all depends what you are measuring the performance of... and whether you intend to make other changes to code or database when you add the new column. Adding an IDENTITY column and doing nothing else would probably be unwise.
仅当:
在每种情况下,表的 PK(假设聚集)将位于每个子条目/NC 条目中。因此,使聚集键更窄将会受益。
如果您只有非 NC 索引(也许有一个)并且没有子表,那么您将获得的只是
...但是您无论如何,仍然需要当前 4 列上的索引/约束 = 空间的增加。
如果您的 4 向键也捕获父表键(听起来很可能),那么您将失去重叠的优势。不过,新的索引/约束将涵盖这一点。
所以不,你可能不想这样做。
我们抛弃了超过 10 亿行的表上的代理键 (bigint),转而使用实际的 11 向键,由于结构更简单(少一个索引、每页稍微多一些行等),磁盘空间减少了 65% 以上
Only if:
In each of these cases, the PK (assumed clustered) of your table will be in each child entry/NC entry. So making the clustered key narrower will benefit.
If you have just non NC indexes (maybe one) and no child tables all you'll achieve is
...but you'll still need an index/constraint on the current 4 columns anyway = an increase in space.
If your 4 way key capture parent table keys too (sounds likely) then you'd lose the advantage of overlap. This would be covered by the new index/constraint though.
So no, you probably don't want to do it.
We threw away a surrogate key (bigint) on a billion+ row table and moved to the actual 11-way key and reduced space on disk by 65%+ because of a simpler structure (one less index, slighty more rows per page etc)
鉴于您的编辑以及该问题引发的所有对话,我建议在此表中添加 IDENTITY 列将弊大于利。
Given your edits, and all the conversation the question has sparked notwithstanding, I would suggest that adding an IDENTITY column to this table will create a lot more harm than benefit.
性能受到损害的一个地方是自然键中数据的更改。然后,更改必须发布到所有子记录。例如,假设其中一个字段是公司名称,并且该公司更改了名称,那么所有相关记录(可能有数百万条)都必须更改,但如果您使用代理键,则只需更改一条记录改变。整数连接往往更快(通常比 4 列连接快得多),并且编写连接代码通常也更快。然而,另一方面,拥有重要的四个字段可能意味着不需要经常进行连接。插入性能将受到轻微影响,并且必须生成代理键并建立索引。通常,这种影响很小以至于不会引起注意,但可能性是存在的。
四列自然键通常并不像您想象的那样是唯一的,因为数据的列数往往会随着时间的推移而变化。虽然它现在是独一无二的,但随着时间的推移它会是独一无二的吗?如果您在自然键上使用了代理键和唯一索引,但后来发现它不是唯一的,那么您所要做的就是删除唯一索引。如果是PK并且有子表,则必须完全重新设计数据库。
只有您可以决定,如果这些考虑因素中的任何一个影响您的特定数据需求,则代理键对于某些应用程序更好,而对于其他应用程序则更差。
One place where performance is hurt is on the change of the data in the natural key. The change woudl then have to promulgate to all the child records. For instance, suppose one of those fields was company name and the company changed their name, then all the related records, and there could be millions of them, would have to change but if you used a surrogate key, only one record would have to change. Integer joins tend to be faster (generally much faster than 4 column joins) and wrting the code to join is generally faster as well. However, on the other hand, having the vital four fields may mean the join isn't needed as often. Insert performanc ewilltake a slight hit as well as the surrogate key has to be generated and indexed. Usually this is so small a hit as to be unnoticalbe but the possibility is there.
A four column natural key is often not a unique as you think it will be because that number of columns the data tends to change over time. While it is unique now, will it be unique over time? If you have used a surrogate key and a unique index onteh natural key and it turns out later not to unique, then all you have to do is drop the unique index. If it is the PK and there are child tables, you have to totally redesign your database.
Only you can decide which if any of these considerations affects your specific data needs, surrogate keys are better for some applications and worse for others.
---编辑:
根据对问题的编辑,添加身份/代理键可能无法解决此问题。
--原始答案。
性能改进的一种情况是当您使用联接并且具有子表时。
如果没有代理键,您必须将所有 th4 4 个键复制到子表并连接 4 列。
连接将包括所有 4 列。
如果您使用代理键并在 4 列(现在是主键的一部分)上创建唯一约束,那么它将非常高效,并且数据仍将像以前一样进行验证。
---EDIT:
Based on the edits to the question, adding an identity/surrogate key might not be the solution to this problem.
--Original Answer.
One case of performance improvement would be when you use joins and when you have child tables.
In the absence of surrogate keys, you would have to replicate all th4 4 keys to the child table and join on the 4 columns.
The joins will include all the 4 columns..
If you use a surrogate key and create a unique constraint on the 4 columns (which are now part of the primary key), it will be both efficient and the data would still be validated as before.