为了 JOIN 表的目的向主键是 UNIQUEIDENTIFIER 的表添加 INT 值得吗？

发布于 2024-11-27 09:51:25 字数 1275 浏览 2 评论 0原文

我的 SQL Server 2008 数据库中有两个表，用户和项目，

tblUser 
--------------------------
UserID    uniqueidentifier
Name      nvarchar(50)
etc..


tblItem 
--------------------------
ItemID    uniqueidentifier
ItemName      nvarchar(50)
etc..

tlmUserUserItem
----------------------------
ItemID      uniqueidentifier
UserID_A    uniqueidentifier
UserID_B    uniqueidentifier

我想将它们连接到一个多对多连接表中，该表将变得巨大（可能超过十亿行，因为应用程序逻辑需要共享用户的统计信息 - -> 项目连接）

连接表需要在 UserID_A 和 UserID_B 列上建立索引，因为查找是基于用户及其同级的。

我的问题是：

是否值得在用户表上添加自动增量 INT 以用作非主键，然后在联接表中使用它？因此，用户表看起来像：

tblUser 
---------------------------------
UserID         uniqueidentifier
Name           nvarchar(50)
UserIDJoinKey  int  identity(1,1)
etc..

这样做，做这样的事情会更快吗：

declare @ID int
select * from tblJoin where UserIDJoinKey_A = @ID or UserIDJoinKey_B = @ID

当连接表看起来像这样时：

tlmUserUserItem
-----------------------------------
ItemID             uniqueidentifier
UserIDJoinKey_A    int
UserIDJoinKey_B    int

而不是这样：

tlmUserUserItem
----------------------------
ItemID      uniqueidentifier
UserID_A    uniqueidentifier
UserID_B    uniqueidentifier

提前致谢。

原文

I've got two tables in my SQL Server 2008 database, Users and Items

tblUser 
--------------------------
UserID    uniqueidentifier
Name      nvarchar(50)
etc..


tblItem 
--------------------------
ItemID    uniqueidentifier
ItemName      nvarchar(50)
etc..

tlmUserUserItem
----------------------------
ItemID      uniqueidentifier
UserID_A    uniqueidentifier
UserID_B    uniqueidentifier

I want to join these together in a many to many join table that will get huge (potentially more than a billion rows as the application logic requires stats over shared user --> item joins)

The join table needs to be indexed on the UserID_A and UserID_B columns since the lookups are based on a user against their peers.

My question is this:

Is it worth adding an auto increment INT on the user table to use as a non primary key then use that in the join table? So the User table looks like:

tblUser 
---------------------------------
UserID         uniqueidentifier
Name           nvarchar(50)
UserIDJoinKey  int  identity(1,1)
etc..

Doing that, will it be faster to do something like:

declare @ID int
select * from tblJoin where UserIDJoinKey_A = @ID or UserIDJoinKey_B = @ID

when the join table looks like this:

tlmUserUserItem
-----------------------------------
ItemID             uniqueidentifier
UserIDJoinKey_A    int
UserIDJoinKey_B    int

rather than this:

tlmUserUserItem
----------------------------
ItemID      uniqueidentifier
UserID_A    uniqueidentifier
UserID_B    uniqueidentifier

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

∝单色的世界 2024-12-04 09:51:25

如果使用 uniqueidentifier 对表进行联接操作时遇到性能问题，请首先检查索引碎片。具有 uniqueidentifier 聚集索引的热表往往会很快产生碎片。有关如何执行此操作的详细信息，请访问 http://msdn.microsoft.com/ en-us/library/ms189858.aspx

如果您能够将聚集索引移动到新的 int 列并重写查询以使用新的 int 列而不是旧的 uniqueidentifier，您最大的好处是可以降低碎片率。这有助于避免在对表进行大量写入后查询速度变慢。

在大多数情况下，您不会注意到 MSSQL 2008 中处理 uniqueidentifier 列与 int 的连接操作的时间存在巨大差异——假设所有其他情况（包括碎片）都相同。

回复收藏 0 原文

遮了一弯 2024-12-04 09:51:25

我可能会误解一些内容，但是您希望为每条记录添加身份和唯一标识符？当我看到您使用 GUID 时，我假设要么有离线功能在用户上线时会合并，要么有一些无关的原因选择了 GUID。这个原因应该会阻碍您在每个项目上正确实现标识列。

如果没有具体原因需要使用 guid 而不是身份，我会建议将 GUID 全部废弃。它会使您的表、索引膨胀，并减慢连接速度。如果我的理解有误，请告诉我，我深表歉意！

回复收藏 0 原文

有深☉意 2024-12-04 09:51:25

要找出最佳解决方案，首先需要一些索引理论。 SQL Server 将其聚集索引数据存储在数据页的 B+ 树中，每页允许大约 8K 数据。
当您知道 uniqueidentifier 每个键 16 个字节，而 int 每个键 4 个字节时，这意味着每个带有 的索引页的键数会多 4 倍int。

为了更快地连接 int 列，您很可能必须将其设为聚集索引。请注意，在如此大的表上添加附加索引可能会对插入语句造成不必要的性能影响，因为有更多信息要写入磁盘。

这一切都归结为对两种解决方案进行基准测试并选择最适合您的解决方案。如果表的读取量较大，则 int 列将提供更好的整体性能。

回复收藏 0 原文

悲欢浪云 2024-12-04 09:51:25

@MikeM，

就我个人而言，我每次都会选择 uniqueidentifier 而不是 int 作为表的主键。不过，我会使用 NEWSEQUENTIALID() 而不是 NEWGUID() 来确保减少索引碎片。

我做出这个选择的原因很简单：

整数太容易混淆，并且在有多个外键的表上，“意外”将值放入错误字段的可能性太高。您永远不会看到这个问题，因为所有标识列都从种子 1 开始，因此大多数表往往在每个表中具有匹配的整数值。通过使用 uniqueidentifier，我绝对保证对于具有外键的列的所有实例，我在其中放置的值是正确的，因为它引用的表是唯一能够具有该唯一标识符的表。

更重要的是......在代码中，你的参数都将是 int，这再次让你有可能意外地将错误的值放入错误的参数中，并且你永远不会知道任何不同。通过使用唯一标识符，您可以再次保证正确的引用。

试图追踪由于交叉发布的整数而导致的错误是阴险的，最糟糕的是你永远不知道问题已经发生，直到为时已晚并且数据已经变得太损坏以至于你无法理清。它所需要的只是一个交叉匹配的整数字段，您可能会创建数百万个不一致的行，除非您“碰巧”尝试插入引用表中不存在的值，否则您不会意识到这些行。到那时可能就太晚了。

回复收藏 0 原文

~没有更多了~