填充因子在物理上是如何分配的?

发布于 2024-08-03 10:30:25 字数 521 浏览 7 评论 0原文

我一直在网上搜索书籍和谷歌咒语,试图找出叶页(SQL Server 2000 和 2005)中的实际填充因子。

我知道它是创建索引时页面上剩余的可用空间量,但我没有发现该空间实际上是如何剩余的:即,它是朝向索引的一大块吗?页面末尾,或者该数据中有几个间隙。

例如,[只是为了简单起见],假设一个页面只能容纳 100 行。如果填充因子被指定为 75%,这是否意味着页面的前(或后)75% 是数据,其余部分是空闲的, 每四行都是空闲的(即,页面看起来像:数据,数据,数据,免费,数据,数据,数据,免费,...)。

总而言之,我正在准确了解将行插入到具有聚集索引的表中时发生的物理操作,并且插入不会发生在行的末尾。如果页面中留有多个间隙,则插入的影响最小(至少在页面拆分之前),因为可能需要移动以容纳插入的行数最小化。如果间隙位于表中的一大块中,那么处理各行的开销将(至少在理论上)显着增加。

如果有人知道 MSDN 参考资料,请指出我!我现在找不到(尽管仍在寻找)。从我读到的内容来看,暗示存在很多差距 - 但这似乎没有明确说明。

I've been trawling books online and google incantations trying to find out what fill factor physically is in a leaf-page (SQL Server 2000 and 2005).

I understand that its the amount of room left free on a page when an index is created, but what I've not found is how that space is actually left: i.e., is it one big chunk towards the end of the page, or is it several gaps through that data.

For example, [just to keep the things simple], assume a page can only hold 100 rows. If the fill-factor is stated to be 75%, does this mean that the first (or last) 75% of the page is data and the rest is free, or is every fourth row free (i.e., the page looks like: data, data, data, free, data, data, data, free, ...).

The long and short of this is that I'm getting a handle on exactly what happens in terms of physical operations that occur when inserting a row into a table with a clustered index, and the insert isn't happening at the end of the row. If multiple gaps are left throught a page, then an insert has minimal impact (at least until a page split) as the number of rows that may need to be moved to accomodate the insert is minimised. If the gap is in one big chunk in the table, then the overhead to juggle the rows around would (in theory at least) be significantly more.

If someone knows an MSDN reference, point me to it please! I can't find one at the moment (still looking though). From what I've read it's implied that it's many gaps - but this doesn't seem to be explicitly stated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

指尖上得阳光 2024-08-10 10:30:25

来自 MSDN

填充因子设置仅在创建或重建索引时适用。 SQL Server 数据库引擎 不会动态地在页面中保留指定百分比的空白空间。尝试维持数据页上的额外空间将无法实现填充因子的目的,因为数据库引擎必须执行页面拆分以维持每个页面上填充因子指定的可用空间百分比:数据已输入。

并且,进一步:

将新行添加到完整索引页时,数据库引擎会将大约一半的行移动到新页,以为新行腾出空间。这种重组称为页面拆分。页面拆分为新记录腾出了空间,但执行起来可能需要时间,并且是一项资源密集型操作。此外,它还可能导致碎片,从而导致 I/O 操作增加。当频繁发生页面分割时,可以通过使用新的或现有的填充因子值来重新分配数据来重建索引。

SQL Server 的数据页由以下元素组成:

  • 页头96 字节,固定。
  • 数据:变量
  • 行偏移数组:变量。

行偏移数组始终存储在页面的末尾向后增长。

数组的每个元素都是 2 字节值,保存页面内每行开头的偏移量。

行在数据页内没有排序:相反,它们的顺序(在集群存储的情况下)由行偏移数组确定。这是已排序的行偏移量。

说,如果我们将一个簇键值为 10100 字节行插入到一个簇表中,并且它进入一个空闲页面,它会按如下方式插入:

[00   - 95   ]   Header
[96   - 195  ]   Row 10
[196  - 8190 ]   Free space
[8190 - 8191 ]   Row offset array: [96]

比方 在同一页面中插入一个新行,这次的簇键值为 9

[00   - 95   ]   Header
[96   - 195  ]   Row 10
[196  - 295  ]   Row 9
[296  - 8188 ]   Free space
[8188 - 8191 ]   Row offset array: [196] [96]

该行在逻辑上前置,但在物理上附加

偏移数组被重新排序以反映行的逻辑顺序。

鉴于此,我们可以很容易地看到行从页面的开头开始附加到可用空间,而指向行的指针附加到可用空间从页面末尾开始。

From MSDN:

The fill-factor setting applies only when the index is created, or rebuilt. The SQL Server Database Engine does not dynamically keep the specified percentage of empty space in the pages. Trying to maintain the extra space on the data pages would defeat the purpose of fill factor because the Database Engine would have to perform page splits to maintain the percentage of free space specified by the fill factor on each page as data is entered.

and, further:

When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations. When frequent page splits occur, the index can be rebuilt by using a new or existing fill factor value to redistribute the data.

SQL Server's data page consists of the following elements:

  • Page header: 96 bytes, fixed.
  • Data: variable
  • Row offset array: variable.

The row offset array is always stored at the end of the page and grows backwards.

Each element of the array is the 2-byte value holding the offset to the beginning of each row within the page.

Rows are not ordered within the data page: instead, their order (in case of clustered storage) is determined by the row offset array. It's the row offsets that are sorted.

Say, if we insert a 100-byte row with cluster key value of 10 into a clustered table and it goes into a free page, it gets inserted as following:

[00   - 95   ]   Header
[96   - 195  ]   Row 10
[196  - 8190 ]   Free space
[8190 - 8191 ]   Row offset array: [96]

Then we insert a new row into the same page, this time with the cluster key value of 9:

[00   - 95   ]   Header
[96   - 195  ]   Row 10
[196  - 295  ]   Row 9
[296  - 8188 ]   Free space
[8188 - 8191 ]   Row offset array: [196] [96]

The row is prepended logically but appended physically.

The offset array is reordered to reflect the logical order of the rows.

Given this, we can easily see that the rows are appended to the free space, starting from the beginning on the page, while pointers to the rows are prepended to the free space starting from the end of the page.

满意归宿 2024-08-10 10:30:25

这是我第一次想到这个,我对这个结论并不肯定,但是,

既然SQL Server在一次Read IO中可以检索到的最小数据量是一个完整的数据页,为什么单个页面中的任何行是否需要首先进行排序?我敢打赌它们不是,所以即使最后的间隙全部在一个大间隙中,新记录也可以添加到最后,无论这是否是正确的排序顺序。 (如果没有理由首先对页面上的记录进行排序)

其次,考虑 IO 的写入端,我认为最小的写入块也是整个页面,(即使是最小的更改也需要整个页面)页被写回磁盘)。这意味着每次写入页面时,页面上的所有行都可以在内存中进行排序,因此即使您插入到 dingle 页面上已排序的行集的开头,整个页面也会被读出,新记录可以插入到内存中集合中正确的插槽中,然后整个新排序的页面被写回磁盘......

This is the first time I've thought of this, and I'm not positive about the conclusion, but,

Since the smallest amount of data that can be retrieved by SQL Server in a single Read IO is one complete page of data, why would any of the rows within a single page need to be sorted in the first place? I'd bet that they're not, so that even if the gap is all in one big gap at the end, new records can be added at the end regardless of whether that's the right sort order. (if there's no reason to sort records on a page in the first place)

And, secondly, thinking about the write side of thge IO, I think the smallest write chunk is an entire page as well, (even the smallest change requires the entire page be written back to disk). This means that all the rows on a page could get sorted in memory every time the page is written to, so even if you were inserting into the beginning of a sorted set of rows on a dingle page, the whole page gets read out, the new record could be inserted into it's proper slot in the set in memory, and then the whole new sorted page gets written back to disk...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文