因此,我和一位同事正在争论哪种方式更适合生成 GUID 主键。
我们将 .NET 4.0 与 Entities 4 结合使用,并使用存储过程进行选择/插入/更新。
他想要在代码中创建 GUID 主键,并使用 Guid 类或/和使用某些创建的 Sequential GUID 类将其作为插入的一部分传回。
我希望 SQL Server 在插入时使用 newid() 或 newsequentialid() 创建 GUID。
我反对他的方法的论点是,如果您必须执行多次插入,则必须进行往返才能为每个插入获取一个 guid,以便维护外键约束的关系。另外,使用这种方式,您必须为每个插入进行多次往返。
他关于使用 SQL 的论点是,在插入发生之前他无权访问该键,并且必须等待插入发生才能将主键 guid 重新用于代码的其他部分。通过这种方式,您可以与存储过程建立一个连接,并由它处理所有插入。
那么,哪种方法对于单次插入更好呢?对于事务中的多次插入,哪种方法更好?
So, a co-worker and I are in an argument on which way is better for generating primary keys that are GUIDs.
We are using .NET 4.0 with Entities 4 and using stored procs to make select/insert/updates.
He wants to create GUID primary key in code and pass it back as part of the insert using the Guid class or/and using some created Sequential GUID class.
I want the GUID to be created by SQL Server on insert using either newid() or newsequentialid().
My argument against his way is that if you have to do multiple inserts you have to make a roundtrip to get a guid for each insert so you maintain that relationship for your foreign key constraints. Plus, using this way you have to make several roundtrips for each insert.
His argument about using SQL to do is that he doesn't have access to the key BEFORE the insert happens and has to wait for the insert to happen to get the primary key guid back to use in other parts of code. This way you can make one connection to a stored proc and it handles all the inserts.
So, Which method is better for single inserts? Which method is better for multiple inserts in a transaction?
发布评论
评论(2)
GUID 似乎是主键的自然选择 - 如果您确实必须这样做,您可能会争论将其用作表的主键。我强烈建议不要这样做,即使用 GUID 列作为聚集键,这是 SQL Server 默认执行的操作,除非您明确告诉它不要这样做。
您确实需要区分两个问题:
1)主键是一个逻辑构造 - 唯一且可靠地标识表中每一行的候选键之一。这实际上可以是任何东西——一个 INT、一个 GUID、一个字符串——选择对你的场景最有意义的。
2)聚集键(定义表上“聚集索引”的一列或多列) - 这是一个物理与存储相关的东西,这里是一个小的、稳定、不断增加的数据类型是您的最佳选择 - INT 或 BIGINT 作为您的默认选项。
默认情况下,SQL Server 表上的主键也用作聚簇键 - 但不必如此!我个人看到,将以前基于 GUID 的主键/聚集键分解为两个单独的键 - GUID 上的主(逻辑)键和单独的 INT IDENTITY(1, 1)专栏。
作为 Kimberly Tripp - 女王索引 - 和其他人已经说过很多次 - GUID 作为集群键并不是最佳的,因为由于它的随机性,它将导致大量页面和索引碎片以及通常较差的性能。
是的,我知道 - SQL Server 2005 及更高版本中有
newsequentialid()
- 但即便如此,它也不是真正完全顺序的,因此也遇到了与 GUID 相同的问题 - 只是稍微不那么突出,所以。如果您坚持使用 GUID,那么至少在服务器上使用newsequentialid()
方法!然后还有另一个问题需要考虑:表上的聚集键也将添加到表上每个非聚集索引的每个条目中 - 因此您确实希望确保它尽可能小。通常,具有 2+ 十亿行的 INT 对于绝大多数表来说应该足够了 - 与作为集群键的 GUID 相比,您可以在磁盘和服务器内存中节省数百兆字节的存储空间。
快速计算 - 使用 INT 与 GUID 作为主键和聚集键:
总计:25 MB 与. 106 MB - 而且仅在一个表上!
还有一些值得深思的东西 - Kimberly Tripp 写的很棒的东西 - 读它,再读它,消化它!这确实是 SQL Server 索引的福音。
马克
GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.
You really need to keep two issues apart:
1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.
As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.
Yes, I know - there's
newsequentialid()
in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so. If you insist on GUID, then at least use thenewsequentialid()
method on the server!Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.
Quick calculation - using INT vs. GUID as Primary and Clustering Key:
TOTAL: 25 MB vs. 106 MB - and that's just on a single table!
Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.
Marc
当我有这样的问题时,我对自己说“SQL Server 擅长集合,所以让它做它擅长的事”,有时“1 只是 N 的一个特定情况”。
对于同步 sql 调用的任何一种方法,单次插入时间都是相同的。然而,“他的”方法会给你带来更多的寻道时间问题,因为他的顺序guid方法不会像sql服务器那样好(并且你可能会失去全球唯一性)。当您不可避免地需要进行多次插入时,它还会分割您的代码库。
如果您争论基于集合的插入( insert / select )与单行插入(插入),则基于集合的插入将在多次插入中获胜,因为返回客户端的旅程将是昂贵的部分。
如果这是我,我会创建一个 SP,它接受要插入的对象的序列化集合,使用 输出子句,请查看 此页面,让 sql server 创建 GUID(如果您卡在它上面)并返回到客户端或运行 SP 中的下一条语句以基于插入子行在您的插入生成的输出表上。
When I have questions like these I say to myself "SQL Server is good at sets, so lets let it do what its good at" and sometimes "1 is just a specific case of N".
The single insert time will be the same for either of your approaches for a synchronous sql call. However "his" approach will give you more problems with seek time down the line because his sequential guid method won't be as good as sql servers (and you will probably lose the global uniqueness). It will also split your code base when you inevitably need to do multiple inserts.
If you are arguing a set based insert ( insert / select ) v.s. a single line insert (insert into), the set based is going to win on multiple inserts because the trip back to the client is going the expensive part.
If this were me I would create a SP that takes a serialized collection of the objects to insert, does an insert / select with an output clause, check out "Example B. Using OUTPUT with identity and computed columns" on this page, let sql server create the GUID (if you are stuck on it) and return to the client or run the next statement in the SP to insert child rows based on the output table your insert generated.