数据库查找/字典表 - 设计问题 - 硬编码 GUID

发布于 2024-10-30 04:47:13 字数 1742 浏览 0 评论 0原文

当重构复杂的存储过程时，我的脑海中出现了以下问题，这是我的数据库的瓶颈......让我介绍一下这个主题。假设我们有类似的查找/字典表（它包含作为其他表的外键的 GUID 和人类可读的名称）：

CREATE TABLE [dbo].[PlayerStatus](
    [PlayerStatusId] [uniqueidentifier] NOT NULL,
    [PlayerStatusName] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_PlayerStatus] PRIMARY KEY CLUSTERED 
(
    [PlayerStatusId] ASC
))

并且有一个 Player 表：

    CREATE TABLE [dbo].[Player](
[PlayerId] [uniqueidentifier] NOT NULL,
[PlayerStatusId] [uniqueidentifier] NOT NULL,
[PlayerName] [nchar](10) NOT NULL, 
[PlayerSurname] [nchar](10) NOT NULL, 
CONSTRAINT [PK_Player] PRIMARY KEY CLUSTERED  ( [PlayerId] ASC )) ON [PRIMARY]

非常简单。

假设代码中的某个地方有一个巨大的查询，该查询访问大量的表：

SELECT ...
FROM Player JOIN PlayerStatus ON Player.PlayerStatusId = PlayerStatus.PlayerStatusId
.....
WHERE PlayerStatus.PlayerStatusName = 'Active' ....

现在，在我的存储过程中，根据执行计划，Players 表一开始就包含在结果集中。假设这是一个包含数百万行的非常大的表，Player 和 PlayerStatus 之间的哈希联接可能会非常耗时。通过优化这个查询，我可以将其重写为类似这样的内容

SELECT ...
FROM Player .....
WHERE PlayerStatus.PlayerStatusId = '46bb6a12-4cd9-4b6c-84c2-7444f5f45eb6' ....

，这正是我在瓶颈过程中所做的。这样，我删除了 4 个包含不同类型状态的类似查找/字典表。令我惊讶的是，我成功地将性能提高了 50%，尽管我认为此类表根本不会影响性能。但这是支线剧情。我的问题是：你对硬编码指南有何看法？

编辑

我确实在 PlayerStatus.PlayerStatusId 和 Player.PlayerId 上有 PK 索引
我在 Player.PlayerStatusId 上确实有 FK 约束：

ALTER TABLE [dbo].[Player] 带检查添加约束 [FK_Player_PlayerStatus] 国外关键（[PlayerStatusId]）参考 [dbo].[玩家状态] ([玩家状态ID])

Player表包含约200万条记录，PlayerStatus表包含约25条记录

原文

Below question arose in my head when refactoring complex stored procedure that was a bottleneck in my database... Let me introduce to the topic. Suppose we have lookup/dictionary table like (it contains GUID as Foreign Key to other tables and name which is human readable):

CREATE TABLE [dbo].[PlayerStatus](
    [PlayerStatusId] [uniqueidentifier] NOT NULL,
    [PlayerStatusName] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_PlayerStatus] PRIMARY KEY CLUSTERED 
(
    [PlayerStatusId] ASC
))

and there's a Player table :

    CREATE TABLE [dbo].[Player](
[PlayerId] [uniqueidentifier] NOT NULL,
[PlayerStatusId] [uniqueidentifier] NOT NULL,
[PlayerName] [nchar](10) NOT NULL, 
[PlayerSurname] [nchar](10) NOT NULL, 
CONSTRAINT [PK_Player] PRIMARY KEY CLUSTERED  ( [PlayerId] ASC )) ON [PRIMARY]

Pretty sraightforward.

Let's say that somewhere in the code there's huge query that accesses a great deal of tables:

SELECT ...
FROM Player JOIN PlayerStatus ON Player.PlayerStatusId = PlayerStatus.PlayerStatusId
.....
WHERE PlayerStatus.PlayerStatusName = 'Active' ....

Now, in my stored procedure, according to execution plan Players table was included in resultset at the beginning. Assuing that this is very large table with milions of rows, hash join between Player and PlayerStatus could be time consuming. By optimizing this query I could rewrite it to someting like this

SELECT ...
FROM Player .....
WHERE PlayerStatus.PlayerStatusId = '46bb6a12-4cd9-4b6c-84c2-7444f5f45eb6' ....

And that's exactly what I did in my bottleneck procedure. That way, I removed 4 similar lookup/dictionary tables that contained different type of statuses. To my suprise, I managed to improve performance by 50%, even though I thought that such tables won't impact performance at all. But that's side plot. My question is: what do you thing of hardcoding guids?

EDIT

I do have PK indexes on PlayerStatus.PlayerStatusId and Player.PlayerId
I do have FK constraint on Player.PlayerStatusId:

ALTER TABLE [dbo].[Player] WITH CHECK
ADD CONSTRAINT
[FK_Player_PlayerStatus] FOREIGN
KEY([PlayerStatusId]) REFERENCES
[dbo].[PlayerStatus]
([PlayerStatusId])

Player table constains about 2mln records, PlayerStatus table contains about 25 records

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凉城凉梦凉人心 2024-11-06 04:47:13

请记住，不建议在 GUID 列上使用聚集索引。将聚集索引转换为常规 PK 索引并再次运行查询。您可能会注意到其中的差异。

回复收藏 0 原文

只等公子 2024-11-06 04:47:13

硬编码 GUID 或其他数字 ID 可能看起来不太优雅，但根据我的经验，有时就性能而言，它被证明是非常有益的。

您的示例非常简单，但如果您有一个包含多个联接的更复杂的查询，删除一个联接可以加快查询速度。代码中的一个示例是删除与 PlayerStatus 的连接，并使用 Player 表中的 PlayerStatusID 进行过滤，而不是使用 PlayerStatus 中的 PlayerStatusName。

当涉及到硬编码 GUID/ID 时，还有两件事需要考虑：

GUID/ID 通常是表中的 PK 并由 FK 引用，因此更改 GUID/ID 比更改即状态名称更困难。将示例中的 PlayerStatusName 从“活动”更改为“运行中”将使使用 PlayerStatusName 的查询变得无用。因此，使用 GUID/ID 可以保证查询基于实体列（PK、FK）
例如，如果您有多个环境，则在查询中使用 GUID/ID 几乎不需要任何纪律。您需要确保保存字典的表中的相关 ID（即 PlayerStatus）在所有数据库实例中都相同。

回复收藏 0 原文

画中仙 2024-11-06 04:47:13

答案 - 不要对 GUID 进行硬编码。那导致标准在哪里呢？在状态表中。如何指定呢？如果字符串是不可变的，很好，如果需要的话可以使用它，我更喜欢逻辑 IsActive 标志。如果性能不可接受，请重新访问 - 使用我们讨论的信息

您有外键约束吗？

如果您正在进行内部联接并且没有外键约束，则无论列是否被使用，每一行都必须匹配（以满足内部联接的逻辑）。

如果您对唯一列（显然是 PK）有外键约束，则优化器知道只能有一个，并且可以消除匹配的需要，因为它知道它将得到满足。

约束是你的朋友。

作为另一个答案表明，您还需要状态外键的索引，我还将查看执行计划以了解到底发生了什么。

就对 GUID 进行硬编码而言，这是不寻常的，因为 GUID 通常是相当匿名的。

另外，我通常会有一个逻辑列，例如状态中的 IsActive，因为您可能有几个在某些情况下逻辑上等效的“状态”，例如 Status IN ('Closed', 'Locked', 'Suspending', '' ）=> IsInactive = 1, while only ('Locked') =>; IsLocked = 1。FWIW，我倾向于不使用单个状态字符串，而是使用帐户上各个状态的物理标志，然后将这些标志的逻辑组合作为查询条件的逻辑标志。

我重新阅读了您发布的内容，就您的执行计划而言，这将根据表中的统计数据而改变。我非常确信这个计划对于 100 名玩家和 100 万玩家是一样的 - 在你尝试进行任何过早的优化之前一定要检查一下。另外，在测试中，请确保统计数据已更新 - 有时适用于一百万行的计划可能会对一行产生影响。

回复收藏 0 原文

~没有更多了~