SQL Server:按时间戳进行聚类;优点/缺点

发布于 2024-08-29 22:09:10 字数 1291 浏览 8 评论 0原文

我在 SQL Server 中有一个表,我希望将插入添加到表的末尾(而不是使用会导致它们插入到中间的聚集键)。这意味着我希望表由一些不断增加的列聚集。

这可以通过在 datetime 列上进行聚类来实现:

CREATE TABLE Things (
    ...
    CreatedDate datetime DEFAULT getdate(),
    [timestamp] timestamp,        

    CONSTRAINT [IX_Things] UNIQUE CLUSTERED (CreatedDate)
)

但我不能保证两个 Things 不会有相同的时间。所以我的要求并不能真正通过日期时间列来实现。

我可以添加一个虚拟身份< /strong> int 列,并对其进行聚类:

CREATE TABLE Things (
    ...
    RowID int IDENTITY(1,1),
    [timestamp] timestamp,        

    CONSTRAINT [IX_Things] UNIQUE CLUSTERED (RowID)
)

但是您会注意到我的表已经包含 时间戳列;保证单调递增的列。这正是我想要的候选簇键的特征。

因此,我将表聚集在 rowversion (又名 timestamp)列上:

CREATE TABLE Things (
    ...
    [timestamp] timestamp,        

    CONSTRAINT [IX_Things] UNIQUE CLUSTERED (timestamp)
)

而不是添加虚拟 identity int 列(RowID) 为了确保顺序,我使用已有的。

我正在寻找的是为什么这是一个坏主意的想法;以及还有哪些更好的想法。

注意:社区 wiki,因为答案是主观的。

I have a table in SQL Server, where i want inserts to be added to the end of the table (as opposed to a clustering key that would cause them to be inserted in the middle). This means I want the table clustered by some column that will constantly increase.

This could be achieved by clustering on a datetime column:

CREATE TABLE Things (
    ...
    CreatedDate datetime DEFAULT getdate(),
    [timestamp] timestamp,        

    CONSTRAINT [IX_Things] UNIQUE CLUSTERED (CreatedDate)
)

But I can't guaranteed that two Things won't have the same time. So my requirements can't really be achieved by a datetime column.

I could add a dummy identity int column, and cluster on that:

CREATE TABLE Things (
    ...
    RowID int IDENTITY(1,1),
    [timestamp] timestamp,        

    CONSTRAINT [IX_Things] UNIQUE CLUSTERED (RowID)
)

But you'll notice that my table already constains a timestamp column; a column which is guaranteed to be a monotonically increasing. This is exactly the characteristic I want for a candidate cluster key.

So I cluster the table on the rowversion (aka timestamp) column:

CREATE TABLE Things (
    ...
    [timestamp] timestamp,        

    CONSTRAINT [IX_Things] UNIQUE CLUSTERED (timestamp)
)

Rather than adding a dummy identity int column (RowID) to ensure an order, I use what I already have.

What I'm looking for are thoughts of why this is a bad idea; and what other ideas are better.

Note: Community wiki, since the answers are subjective.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

叫思念不要吵 2024-09-05 22:09:10

所以我将表聚集在
rowversion(又名时间戳)列:
而不是添加一个虚拟身份
int 列(RowID)以确保顺序,
我使用已有的东西。

乍一看,这可能听起来是个好主意 - 但这实际上几乎是您拥有的最糟糕的选择。为什么?

聚集键的主要要求是(请参阅 Kim Tripp 的博客文章 了解更多精彩详细信息):

  • 如果可能,稳定
  • 狭窄
  • 唯一
  • 不断增加

您的 rowversion 违反了稳定要求,这可能是最重要的要求。行的行版本随着对行的每次修改而变化 - 并且由于您的聚集键被添加到表中的每个非聚集索引,因此您的服务器将不断更新非聚集索引的负载并浪费大量时间这样做。

最后,添加一个虚拟身份列对于您的情况可能是一个更好的替代方案。第二个最佳选择是 datetime 列 - 但在这里,您确实面临 SQL Server 在出现重复项时必须向您的条目添加“唯一符”的风险 - 并且精度为 3.33 毫秒,这可能肯定会发生 - 不是最佳的,但绝对比 rowversion 的想法好得多......

So I cluster the table on the
rowversion (aka timestamp) column:
Rather than adding a dummy identity
int column (RowID) to ensure an order,
I use what I already have.

That might sound like a good idea at first - but it's really almost the worst option you have. Why?

The main requirements for a clustered key are (see Kim Tripp's blog post for more excellent details):

  • stable
  • narrow
  • unique
  • ever-increasing if possible

Your rowversion violates the stable requirement, and that's probably the most important one. The rowversion of a row changes with each modification to the row - and since your clustering key is being added to each and every non-clustered index in the table, your server will be constantly updating loads of non-clustered indices and wasting a lot of time doing so.

In the end, adding a dummy identity column probably is a much better alternative for your case. The second best choice would be the datetime column - but here, you do run the risk of SQL Server having to add "uniqueifiers" to your entries when duplicates occur - and with a 3.33ms accuracy, this could definitely be happening - not optimal, but definitely much better than the rowversion idea...

一瞬间的火花 2024-09-05 22:09:10

来自链接:时间戳问题:

时间戳语法已弃用。
此功能将在
Microsoft SQL 的未来版本
服务器。避免在以下情况中使用此功能
新的开发工作,并计划
修改当前使用的应用程序
此功能。

重复的 rowversion 值可能是
使用 SELECT INTO 生成
语句,其中 rowversion 列
位于 SELECT 列表中。我们不
建议在此使用 rowversion
方式。

那么到底为什么要按其中任何一个进行聚类,特别是当行更新时它们的值 alwsys 会发生变化?只需使用一个身份作为 PK 并对其进行集群即可。

from the link: timestamp in the question:

The timestamp syntax is deprecated.
This feature will be removed in a
future version of Microsoft SQL
Server. Avoid using this feature in
new development work, and plan to
modify applications that currently use
this feature.

and

Duplicate rowversion values can be
generated by using the SELECT INTO
statement in which a rowversion column
is in the SELECT list. We do not
recommend using rowversion in this
manner.

so why on earth would you want to cluster by either, especially since their values alwsys change when the row is updated? just use an identity as the PK and cluster on it.

So要识趣 2024-09-05 22:09:10

你已经走在正确的轨道上了。您可以使用保存创建日期的 DateTime 列并创建聚集但非唯一约束。

CREATE TABLE Things (
    ...
    CreatedDate datetime DEFAULT getdate(),
    [timestamp] timestamp,        
)

CREATE CLUSTERED INDEX [IX_CreatedDate] ON .[Things] 
(
    [CreatedDate] ASC
)

You were on the right track already. You can use a DateTime column that holds the created date and create a CLUSTERED but non unique constraint.

CREATE TABLE Things (
    ...
    CreatedDate datetime DEFAULT getdate(),
    [timestamp] timestamp,        
)

CREATE CLUSTERED INDEX [IX_CreatedDate] ON .[Things] 
(
    [CreatedDate] ASC
)
千柳 2024-09-05 22:09:10

如果该表进行大量插入,您可能会创建一个干扰更新的热点,因为所有插入都将发生在相同的物理/索引页上。检查您的锁定设置。

If this table gets a lot of inserts, you might be creating a hot spot that interferes with updates, because all of the inserts will be happening on the same physical/index pages. Check your locking setup.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文