SQL Db索引推荐

发布于 2024-10-03 02:16:27 字数 618 浏览 7 评论 0原文

我想看看对特定类型的数据使用自定义索引是否可以减少数据库中的碎片。

[编辑:我们正在使用 MS SQL Server 2008 R2]

我有一个包含带时间戳的测量数据的 SQL 数据库。大量数据一直被插入,但一旦插入,实际上就不需要更新。然而,这些时间戳是唯一的,因为多个设备(大约 50 个)同时测量数据。

这意味着表中每 50 行包含相同的时间戳值。这些数据或多或少是同时接收的,尽管我可以更加小心地确保尽可能按顺序写入行(如果这有帮助),也许可以将它们保留在内存中一段时间​​,然后仅在获取数据时写入来自所有设备的单个时间戳。

我们将 NHibernate 与 Guid.Comb 一起使用,以避免使用普通 bigint ID 进行索引查找。与普通 GUID 不同,这应该会减少碎片,但对于如此多的插入,碎片仍然很快就会发生。

由于我的数据带有时间戳,并且数据几乎是按顺序插入的(增加时间戳),我想知道是否有一种更聪明的方法来为此表创建具有唯一聚集索引的主键。时间戳列基本上是一个 bigint 数字(.NET DateTime 刻度)。

我还注意到同一时间戳列上的非聚集索引也变得相当分散。那么在这种情况下,您建议采用什么索引策略来减少堆碎片呢?

I am trying to see if using a custom index for a specific type of data might reduce fragmentation in my database.

[Edit: we are using MS SQL Server 2008 R2]

I have an SQL database containing timestamped measurement data. Lots of data is inserted all the time, but once inserted it practically never needs to be updated. These timestamps are, however, not unique, as several devices (around 50 of them) measure the data at the same time.

This means that every 50 rows in the table contain equal timestamp values. This data is received more or less simultaneously, although I could take additional care to ensure that rows are written as sequentially as possible (if that would help), perhaps by keeping them in memory for some time and then writing only when I get the data from all the devices for a single timestamp.

We are using NHibernate with Guid.Comb to avoid index lookups we would have with plain bigint IDs. As opposed to plain GUIDs, this should reduce fragmentation, but for so many inserts, fragmentation nevertheless happens very soon.

Since my data is timestamped, and data is inserted almost sequentially (increasing timestamps), I am wondering if there is a more clever way to create a primary key with a unique clustered index for this table. Timestamp column is basically a bigint number (.NET DateTime ticks).

I have also noticed that a non-clustered index over that same timestamp column also gets pretty fragmented. So what index strategy would you recommend to reduce heap fragmentation in this case?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

亽野灬性zι浪 2024-10-10 02:16:27

也许看看这个答案,HiLo 看起来很有趣。

另外,也许您的碎片不是索引值的排序与添加顺序之间的差异造成的,而是自然文件增长效应的结果(如所解释的此处)?

Maybe take a look at this answer, HiLo looks interesting.

Also, maybe your fragmentation is not result of the discrepancy between the ordering of the index values and the order in which they are added, but natural file growth effect (as explained here)?

若水般的淡然安静女子 2024-10-10 02:16:27

对于该表来说,单独的键列没有多大意义,因为您不会更新任何数据。我想您可能会根据时间戳列进行大量查询。

您可以尝试将主键设为时间戳列和设备 ID 列的组合。您可以尝试将其聚类。这应该可以让你尽可能快地写作。但是,如果您按设备查询,则可能需要设备 ID 和时间戳上的另一个索引(反之亦然)。不过,我不会将集群的操作反过来,因为这将使写入发生在整个表上,而不是在尾随页上。如果大多数查询涉及日期范围和多个设备,则首先按时间戳进行集群应该可以为您提供最佳性能。

A seperate column for a key doesn't make a lot of sense for this table since you won't be updating any of the data. I imagine you'll be doing a lot of queries though, probably based on that timestamp column.

You could try making the primary key a combination of the timestamp column and a device id column. You could try making that clustered. That should allow you to write nearly as fast as possible. If you query by device however, you may need another index on device id and timestamp (the reverse). I wouldn't make the reverse the clustered one though, as that will make the writes happen all over the table rather than on the trailing pages. And if most queries involve a date range and more than one device, clustering on timestamp first should give you the best performance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文