SQL Server:按时间戳进行聚类;优点/缺点
我在 SQL Server 中有一个表,我希望将插入添加到表的末尾(而不是使用会导致它们插入到中间的聚集键)。这意味着我希望表由一些不断增加的列聚集。
这可以通过在 datetime
列上进行聚类来实现:
CREATE TABLE Things (
...
CreatedDate datetime DEFAULT getdate(),
[timestamp] timestamp,
CONSTRAINT [IX_Things] UNIQUE CLUSTERED (CreatedDate)
)
但我不能保证两个 Things
不会有相同的时间。所以我的要求并不能真正通过日期时间列来实现。
我可以添加一个虚拟身份< /strong> int
列,并对其进行聚类:
CREATE TABLE Things (
...
RowID int IDENTITY(1,1),
[timestamp] timestamp,
CONSTRAINT [IX_Things] UNIQUE CLUSTERED (RowID)
)
但是您会注意到我的表已经包含 时间戳
列;保证单调递增的列。这正是我想要的候选簇键的特征。
因此,我将表聚集在 rowversion
(又名 timestamp
)列上:
CREATE TABLE Things (
...
[timestamp] timestamp,
CONSTRAINT [IX_Things] UNIQUE CLUSTERED (timestamp)
)
而不是添加虚拟 identity int
列(RowID) 为了确保顺序,我使用已有的。
我正在寻找的是为什么这是一个坏主意的想法;以及还有哪些更好的想法。
注意:社区 wiki,因为答案是主观的。
I have a table in SQL Server, where i want inserts to be added to the end of the table (as opposed to a clustering key that would cause them to be inserted in the middle). This means I want the table clustered by some column that will constantly increase.
This could be achieved by clustering on a datetime
column:
CREATE TABLE Things (
...
CreatedDate datetime DEFAULT getdate(),
[timestamp] timestamp,
CONSTRAINT [IX_Things] UNIQUE CLUSTERED (CreatedDate)
)
But I can't guaranteed that two Things
won't have the same time. So my requirements can't really be achieved by a datetime column.
I could add a dummy identity int
column, and cluster on that:
CREATE TABLE Things (
...
RowID int IDENTITY(1,1),
[timestamp] timestamp,
CONSTRAINT [IX_Things] UNIQUE CLUSTERED (RowID)
)
But you'll notice that my table already constains a timestamp
column; a column which is guaranteed to be a monotonically increasing. This is exactly the characteristic I want for a candidate cluster key.
So I cluster the table on the rowversion
(aka timestamp
) column:
CREATE TABLE Things (
...
[timestamp] timestamp,
CONSTRAINT [IX_Things] UNIQUE CLUSTERED (timestamp)
)
Rather than adding a dummy identity int
column (RowID) to ensure an order, I use what I already have.
What I'm looking for are thoughts of why this is a bad idea; and what other ideas are better.
Note: Community wiki, since the answers are subjective.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
乍一看,这可能听起来是个好主意 - 但这实际上几乎是您拥有的最糟糕的选择。为什么?
聚集键的主要要求是(请参阅 Kim Tripp 的博客文章 了解更多精彩详细信息):
您的
rowversion
违反了稳定要求,这可能是最重要的要求。行的行版本随着对行的每次修改而变化 - 并且由于您的聚集键被添加到表中的每个非聚集索引,因此您的服务器将不断更新非聚集索引的负载并浪费大量时间这样做。最后,添加一个虚拟身份列对于您的情况可能是一个更好的替代方案。第二个最佳选择是
datetime
列 - 但在这里,您确实面临 SQL Server 在出现重复项时必须向您的条目添加“唯一符”的风险 - 并且精度为 3.33 毫秒,这可能肯定会发生 - 不是最佳的,但绝对比 rowversion 的想法好得多......That might sound like a good idea at first - but it's really almost the worst option you have. Why?
The main requirements for a clustered key are (see Kim Tripp's blog post for more excellent details):
Your
rowversion
violates the stable requirement, and that's probably the most important one. The rowversion of a row changes with each modification to the row - and since your clustering key is being added to each and every non-clustered index in the table, your server will be constantly updating loads of non-clustered indices and wasting a lot of time doing so.In the end, adding a dummy identity column probably is a much better alternative for your case. The second best choice would be the
datetime
column - but here, you do run the risk of SQL Server having to add "uniqueifiers" to your entries when duplicates occur - and with a 3.33ms accuracy, this could definitely be happening - not optimal, but definitely much better than therowversion
idea...来自链接:
时间戳
问题:和
那么到底为什么要按其中任何一个进行聚类,特别是当行更新时它们的值 alwsys 会发生变化?只需使用一个身份作为 PK 并对其进行集群即可。
from the link:
timestamp
in the question:and
so why on earth would you want to cluster by either, especially since their values alwsys change when the row is updated? just use an identity as the PK and cluster on it.
你已经走在正确的轨道上了。您可以使用保存创建日期的 DateTime 列并创建聚集但非唯一约束。
You were on the right track already. You can use a DateTime column that holds the created date and create a CLUSTERED but non unique constraint.
如果该表进行大量插入,您可能会创建一个干扰更新的热点,因为所有插入都将发生在相同的物理/索引页上。检查您的锁定设置。
If this table gets a lot of inserts, you might be creating a hot spot that interferes with updates, because all of the inserts will be happening on the same physical/index pages. Check your locking setup.