数据库查找/字典表 - 设计问题 - 硬编码 GUID
当重构复杂的存储过程时,我的脑海中出现了以下问题,这是我的数据库的瓶颈......让我介绍一下这个主题。假设我们有类似的查找/字典表(它包含作为其他表的外键的 GUID 和人类可读的名称):
CREATE TABLE [dbo].[PlayerStatus](
[PlayerStatusId] [uniqueidentifier] NOT NULL,
[PlayerStatusName] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_PlayerStatus] PRIMARY KEY CLUSTERED
(
[PlayerStatusId] ASC
))
并且有一个 Player 表:
CREATE TABLE [dbo].[Player](
[PlayerId] [uniqueidentifier] NOT NULL,
[PlayerStatusId] [uniqueidentifier] NOT NULL,
[PlayerName] [nchar](10) NOT NULL,
[PlayerSurname] [nchar](10) NOT NULL,
CONSTRAINT [PK_Player] PRIMARY KEY CLUSTERED ( [PlayerId] ASC )) ON [PRIMARY]
非常简单。
假设代码中的某个地方有一个巨大的查询,该查询访问大量的表:
SELECT ...
FROM Player JOIN PlayerStatus ON Player.PlayerStatusId = PlayerStatus.PlayerStatusId
.....
WHERE PlayerStatus.PlayerStatusName = 'Active' ....
现在,在我的存储过程中,根据执行计划,Players 表一开始就包含在结果集中。假设这是一个包含数百万行的非常大的表,Player 和 PlayerStatus 之间的哈希联接可能会非常耗时。通过优化这个查询,我可以将其重写为类似这样的内容
SELECT ...
FROM Player .....
WHERE PlayerStatus.PlayerStatusId = '46bb6a12-4cd9-4b6c-84c2-7444f5f45eb6' ....
,这正是我在瓶颈过程中所做的。这样,我删除了 4 个包含不同类型状态的类似查找/字典表。令我惊讶的是,我成功地将性能提高了 50%,尽管我认为此类表根本不会影响性能。但这是支线剧情。我的问题是:你对硬编码指南有何看法?
编辑
- 我确实在 PlayerStatus.PlayerStatusId 和 Player.PlayerId 上有 PK 索引
- 我在 Player.PlayerStatusId 上确实有 FK 约束:
ALTER TABLE [dbo].[Player] 带检查 添加约束 [FK_Player_PlayerStatus] 国外 关键([PlayerStatusId])参考 [dbo].[玩家状态] ([玩家状态ID])
- Player表包含约200万条记录,PlayerStatus表包含约25条记录
Below question arose in my head when refactoring complex stored procedure that was a bottleneck in my database... Let me introduce to the topic. Suppose we have lookup/dictionary table like (it contains GUID as Foreign Key to other tables and name which is human readable):
CREATE TABLE [dbo].[PlayerStatus](
[PlayerStatusId] [uniqueidentifier] NOT NULL,
[PlayerStatusName] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_PlayerStatus] PRIMARY KEY CLUSTERED
(
[PlayerStatusId] ASC
))
and there's a Player table :
CREATE TABLE [dbo].[Player](
[PlayerId] [uniqueidentifier] NOT NULL,
[PlayerStatusId] [uniqueidentifier] NOT NULL,
[PlayerName] [nchar](10) NOT NULL,
[PlayerSurname] [nchar](10) NOT NULL,
CONSTRAINT [PK_Player] PRIMARY KEY CLUSTERED ( [PlayerId] ASC )) ON [PRIMARY]
Pretty sraightforward.
Let's say that somewhere in the code there's huge query that accesses a great deal of tables:
SELECT ...
FROM Player JOIN PlayerStatus ON Player.PlayerStatusId = PlayerStatus.PlayerStatusId
.....
WHERE PlayerStatus.PlayerStatusName = 'Active' ....
Now, in my stored procedure, according to execution plan Players table was included in resultset at the beginning. Assuing that this is very large table with milions of rows, hash join between Player and PlayerStatus could be time consuming. By optimizing this query I could rewrite it to someting like this
SELECT ...
FROM Player .....
WHERE PlayerStatus.PlayerStatusId = '46bb6a12-4cd9-4b6c-84c2-7444f5f45eb6' ....
And that's exactly what I did in my bottleneck procedure. That way, I removed 4 similar lookup/dictionary tables that contained different type of statuses. To my suprise, I managed to improve performance by 50%, even though I thought that such tables won't impact performance at all. But that's side plot. My question is: what do you thing of hardcoding guids?
EDIT
- I do have PK indexes on PlayerStatus.PlayerStatusId and Player.PlayerId
- I do have FK constraint on Player.PlayerStatusId:
ALTER TABLE [dbo].[Player] WITH CHECK
ADD CONSTRAINT
[FK_Player_PlayerStatus] FOREIGN
KEY([PlayerStatusId]) REFERENCES
[dbo].[PlayerStatus]
([PlayerStatusId])
- Player table constains about 2mln records, PlayerStatus table contains about 25 records
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
请记住,不建议在 GUID 列上使用聚集索引。将聚集索引转换为常规 PK 索引并再次运行查询。您可能会注意到其中的差异。
Please keep in mind, clustered indexes are not suggested on GUID columns. Convert your clustered indexes to regular PK indexes and run your queries again. You might notice the difference.
硬编码 GUID 或其他数字 ID 可能看起来不太优雅,但根据我的经验,有时就性能而言,它被证明是非常有益的。
您的示例非常简单,但如果您有一个包含多个联接的更复杂的查询,删除一个联接可以加快查询速度。代码中的一个示例是删除与 PlayerStatus 的连接,并使用 Player 表中的 PlayerStatusID 进行过滤,而不是使用 PlayerStatus 中的 PlayerStatusName。
当涉及到硬编码 GUID/ID 时,还有两件事需要考虑:
Hard coding GUIDs or other numeric IDs may not look very elegant, but from my experience sometimes it proved to be quite beneficial as far as performance is concerned.
Your example is quite simple, but if you had a more complex query with many joins, removing one join could speed up the query. An example in your code would be removing join with PlayerStatus and filter using PlayerStatusID from Player table instead of using PlayerStatusName from PlayerStatus.
There are two more thing to consider when it comes to hard coding GUIDs/IDs:
答案 - 不要对 GUID 进行硬编码。那导致标准在哪里呢?在状态表中。如何指定呢?如果字符串是不可变的,很好,如果需要的话可以使用它,我更喜欢逻辑 IsActive 标志。如果性能不可接受,请重新访问 - 使用我们讨论的信息
您有外键约束吗?
如果您正在进行内部联接并且没有外键约束,则无论列是否被使用,每一行都必须匹配(以满足内部联接的逻辑)。
如果您对唯一列(显然是 PK)有外键约束,则优化器知道只能有一个,并且可以消除匹配的需要,因为它知道它将得到满足。
约束是你的朋友。
作为另一个答案表明,您还需要状态外键的索引,我还将查看执行计划以了解到底发生了什么。
就对 GUID 进行硬编码而言,这是不寻常的,因为 GUID 通常是相当匿名的。
另外,我通常会有一个逻辑列,例如状态中的 IsActive,因为您可能有几个在某些情况下逻辑上等效的“状态”,例如 Status IN ('Closed', 'Locked', 'Suspending', '' )=> IsInactive = 1, while only ('Locked') =>; IsLocked = 1。FWIW,我倾向于不使用单个状态字符串,而是使用帐户上各个状态的物理标志,然后将这些标志的逻辑组合作为查询条件的逻辑标志。
我重新阅读了您发布的内容,就您的执行计划而言,这将根据表中的统计数据而改变。我非常确信这个计划对于 100 名玩家和 100 万玩家是一样的 - 在你尝试进行任何过早的优化之前一定要检查一下。另外,在测试中,请确保统计数据已更新 - 有时适用于一百万行的计划可能会对一行产生影响。
Answer - Don't hardcode the GUID. That leads to where is the criteria? In the status table. How to specify it? If string is immutable, fine, use that if you want, I prefer a logical IsActive flag. If performance is unacceptable, revisit - using info we discussed
Do you have a foreign key constraint?
If you are doing an inner join and DO NOT have a foreign key constraint, each row has to be matched (to satisfy the logic of the inner join) regardless of whether the column is consumed.
If you have a foreign key constraint to a unique column (a PK, obviously), the optimizer knows that there can be one and only one and it can eliminate the need to match because it knows it will be satisified.
Constraints are your friend.
As the other answer indicates, you also need an index on your status foreign key, and I would also review the execution plan to see what exactly is going on.
As far as hardcoding the GUID, it's unusual, since GUIDs are generally quite anonymous.
Also, I typically would have a logical column, like IsActive in the status, since you might have several "statuses" which are equivalent logically in certain circumstances, like Status IN ('Closed', 'Locked', 'Suspended', '') => IsInactive = 1, while only ('Locked') => IsLocked = 1. FWIW, I tend not to use a single status string, but to use physical flags for individual states on accounts and then logical combinations of these as logical flags for query criteria.
I re-read what you posted and as far as your execution plan, this is going to change depending on the statistics in the table. I double very much that the plan would be the same for 100 Players as it would be for 1 Million players - definitely check that before you try to do any premature optimization. Also, in testing, be sure statistics are updated - sometimes a plan which is good for a million rows will freak out for one row.