为了 JOIN 表的目的向主键是 UNIQUEIDENTIFIER 的表添加 INT 值得吗?

发布于 2024-11-27 09:51:25 字数 1275 浏览 2 评论 0原文

我的 SQL Server 2008 数据库中有两个表,用户和项目,

tblUser 
--------------------------
UserID    uniqueidentifier
Name      nvarchar(50)
etc..


tblItem 
--------------------------
ItemID    uniqueidentifier
ItemName      nvarchar(50)
etc..

tlmUserUserItem
----------------------------
ItemID      uniqueidentifier
UserID_A    uniqueidentifier
UserID_B    uniqueidentifier

我想将它们连接到一个多对多连接表中,该表将变得巨大(可能超过十亿行,因为应用程序逻辑需要共享用户的统计信息 - -> 项目连接)

连接表需要在 UserID_A 和 UserID_B 列上建立索引,因为查找是基于用户及其同级的。

我的问题是:

是否值得在用户表上添加自动增量 INT 以用作非主键,然后在联接表中使用它?因此,用户表看起来像:

tblUser 
---------------------------------
UserID         uniqueidentifier
Name           nvarchar(50)
UserIDJoinKey  int  identity(1,1)
etc..

这样做,做这样的事情会更快吗:

declare @ID int
select * from tblJoin where UserIDJoinKey_A = @ID or UserIDJoinKey_B = @ID

当连接表看起来像这样时:

tlmUserUserItem
-----------------------------------
ItemID             uniqueidentifier
UserIDJoinKey_A    int
UserIDJoinKey_B    int

而不是这样:

tlmUserUserItem
----------------------------
ItemID      uniqueidentifier
UserID_A    uniqueidentifier
UserID_B    uniqueidentifier

提前致谢。

I've got two tables in my SQL Server 2008 database, Users and Items

tblUser 
--------------------------
UserID    uniqueidentifier
Name      nvarchar(50)
etc..


tblItem 
--------------------------
ItemID    uniqueidentifier
ItemName      nvarchar(50)
etc..

tlmUserUserItem
----------------------------
ItemID      uniqueidentifier
UserID_A    uniqueidentifier
UserID_B    uniqueidentifier

I want to join these together in a many to many join table that will get huge (potentially more than a billion rows as the application logic requires stats over shared user --> item joins)

The join table needs to be indexed on the UserID_A and UserID_B columns since the lookups are based on a user against their peers.

My question is this:

Is it worth adding an auto increment INT on the user table to use as a non primary key then use that in the join table? So the User table looks like:

tblUser 
---------------------------------
UserID         uniqueidentifier
Name           nvarchar(50)
UserIDJoinKey  int  identity(1,1)
etc..

Doing that, will it be faster to do something like:

declare @ID int
select * from tblJoin where UserIDJoinKey_A = @ID or UserIDJoinKey_B = @ID

when the join table looks like this:

tlmUserUserItem
-----------------------------------
ItemID             uniqueidentifier
UserIDJoinKey_A    int
UserIDJoinKey_B    int

rather than this:

tlmUserUserItem
----------------------------
ItemID      uniqueidentifier
UserID_A    uniqueidentifier
UserID_B    uniqueidentifier

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

∝单色的世界 2024-12-04 09:51:25

如果使用 uniqueidentifier 对表进行联接操作时遇到性能问题,请首先检查索引碎片。具有 uniqueidentifier 聚集索引的热表往往会很快产生碎片。有关如何执行此操作的详细信息,请访问 http://msdn.microsoft.com/ en-us/library/ms189858.aspx

如果您能够将聚集索引移动到新的 int 列并重写查询以使用新的 int 列而不是旧的 uniqueidentifier,您最大的好处是可以降低碎片率。这有助于避免在对表进行大量写入后查询速度变慢。

在大多数情况下,您不会注意到 MSSQL 2008 中处理 uniqueidentifier 列与 int 的连接操作的时间存在巨大差异——假设所有其他情况(包括碎片)都相同。

If you're having a performance problem on join operations to the table with the uniqueidentifier, first check the index fragmentation. Hot tables with a uniqueidentifier clustered index tend to get fragmented quickly. There's good info on how to do that at http://msdn.microsoft.com/en-us/library/ms189858.aspx

If you are able to move the clustered index to the new int column and rewrite your queries to use the new int column instead of the old uniqueidentifier, you're biggest benefit is going to be that you'll reduce rate of fragmentation. This helps avoid having your queries slow down after a a bunch of writes to the table.

In most cases, you will not notice a huge difference in the time to process join operations on a uniqueidentifier column versus an int in MSSQL 2008 -- assuming all other things (including fragmentation) are equal.

遮了一弯 2024-12-04 09:51:25

我可能会误解一些内容,但是您希望为每条记录添加身份和唯一标识符?当我看到您使用 GUID 时,我假设要么有离线功能在用户上线时会合并,要么有一些无关的原因选择了 GUID。这个原因应该会阻碍您在每个项目上正确实现标识列。

如果没有具体原因需要使用 guid 而不是身份,我会建议将 GUID 全部废弃。它会使您的表、索引膨胀,并减慢连接速度。如果我的理解有误,请告诉我,我深表歉意!

I may be misunderstanding something along the line, but you're looking to add an identity AND a uniqueidentifier to a each record? When I see you using a GUID, I assume there is either offline functionality that will be merged when the user goes online, or there is some extraneous reason that the GUID was chosen. That reason should hinder you from being able to correctly implement an identity column on each item.

If there is no specific reason why you needed to use a guid over an identity, I'd say scrap the GUID all together. It's bloating your tables, indexes, and slowing down your joins. If I'm misunderstanding please let me know and I apologize!

有深☉意 2024-12-04 09:51:25

要找出最佳解决方案,首先需要一些索引理论。 SQL Server 将其聚集索引数据存储在数据页的 B+ 树中,每页允许大约 8K 数据。
当您知道 uniqueidentifier 每个键 16 个字节,而 int 每个键 4 个字节时,这意味着每个带有 的索引页的键数会多 4 倍int

为了更快地连接 int 列,您很可能必须将其设为聚集索引。请注意,在如此大的表上添加附加索引可能会对插入语句造成不必要的性能影响,因为有更多信息要写入磁盘。

这一切都归结为对两种解决方案进行基准测试并选择最适合您的解决方案。如果表的读取量较大,则 int 列将提供更好的整体性能。

To find out what is the best solution there is first some indexing theory. SQL Server stores it's clustered index data in a B+ Tree of data pages which allow for about 8K data per page.
When you know that a uniqueidentifier is 16 bytes per key and an int is 4 bytes per key this means there will be 4 times more keys per index page with an int.

To have a faster join with the int column you will most likely have to make it the clustered index. Be aware that having an additional index on such a large table might create an unwanted performance hit on insert statements as there is a more information to write to disk.

It all boils down to benchmark both solutions and choosing the one which performs best for you. If the table is more read heavy, the int column will offer overall better performance.

悲欢浪云 2024-12-04 09:51:25

@MikeM,

就我个人而言,我每次都会选择 uniqueidentifier 而不是 int 作为表的主键。不过,我会使用 NEWSEQUENTIALID() 而不是 NEWGUID() 来确保减少索引碎片。

我做出这个选择的原因很简单:

整数太容易混淆,并且在有多个外键的表上,“意外”将值放入错误字段的可能性太高。您永远不会看到这个问题,因为所有标识列都从种子 1 开始,因此大多数表往往在每个表中具有匹配的整数值。通过使用 uniqueidentifier,我绝对保证对于具有外键的列的所有实例,我在其中放置的值是正确的,因为它引用的表是唯一能够具有该唯一标识符的表。

更重要的是......在代码中,你的参数都将是 int,这再次让你有可能意外地将错误的值放入错误的参数中,并且你永远不会知道任何不同。通过使用唯一标识符,您可以再次保证正确的引用。

试图追踪由于交叉发布的整数而导致的错误是阴险的,最糟糕的是你永远不知道问题已经发生,直到为时已晚并且数据已经变得太损坏以至于你无法理清。它所需要的只是一个交叉匹配的整数字段,您可能会创建数百万个不一致的行,除非您“碰巧”尝试插入引用表中不存在的值,否则您不会意识到这些行。到那时可能就太晚了。

@MikeM,

Personally I would always choose a uniqueidentifier over an int as the primary key of a table every time. I would however use NEWSEQUENTIALID() and not NEWGUID() to ensure there is less index fragmentation.

The reason I make this choice is simple:

Integers are too easy to get mixed up, and on a table which has several foreign keys, the chances of "accidentally" putting a value in the wrong field is too high. You will never see the problem because ALL identity columns start at a seed of 1 and so most tables tend to have matching integer values in each table. By using uniqueidentifier I absolutely guarantee for all instances of a column that has a foreign key that the value I place in it is correct, because the table it references is the only table capable of having that unique identifier.

What's more... in code, your arguments would all be int, which again opens you up to the possibility of accidentally putting the wrong value in the wrong parameter and you would never know any different. By using unique identifiers instead, once again you are guaranteeing the correct reference.

Trying to track down bugs due to cross posted integers is insidious and the worst part is that you never know the problem has occurred until it is too late and data has become far too corrupted for you to ever unjumble. All it takes is one cross matched integer field and you could potentially create millions of inconsistent rows, none of which you would be aware of until you just "happen" to try and insert a value that doesn't exist in the referenced table... and by then it could be too late.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文