填充因子在物理上是如何分配的?
我一直在网上搜索书籍和谷歌咒语,试图找出叶页(SQL Server 2000 和 2005)中的实际填充因子。
我知道它是创建索引时页面上剩余的可用空间量,但我没有发现该空间实际上是如何剩余的:即,它是朝向索引的一大块吗?页面末尾,或者该数据中有几个间隙。
例如,[只是为了简单起见],假设一个页面只能容纳 100 行。如果填充因子被指定为 75%,这是否意味着页面的前(或后)75% 是数据,其余部分是空闲的,或 每四行都是空闲的(即,页面看起来像:数据,数据,数据,免费,数据,数据,数据,免费,...)。
总而言之,我正在准确了解将行插入到具有聚集索引的表中时发生的物理操作,并且插入不会发生在行的末尾。如果页面中留有多个间隙,则插入的影响最小(至少在页面拆分之前),因为可能需要移动以容纳插入的行数最小化。如果间隙位于表中的一大块中,那么处理各行的开销将(至少在理论上)显着增加。
如果有人知道 MSDN 参考资料,请指出我!我现在找不到(尽管仍在寻找)。从我读到的内容来看,暗示存在很多差距 - 但这似乎没有明确说明。
I've been trawling books online and google incantations trying to find out what fill factor physically is in a leaf-page (SQL Server 2000 and 2005).
I understand that its the amount of room left free on a page when an index is created, but what I've not found is how that space is actually left: i.e., is it one big chunk towards the end of the page, or is it several gaps through that data.
For example, [just to keep the things simple], assume a page can only hold 100 rows. If the fill-factor is stated to be 75%, does this mean that the first (or last) 75% of the page is data and the rest is free, or is every fourth row free (i.e., the page looks like: data, data, data, free, data, data, data, free, ...).
The long and short of this is that I'm getting a handle on exactly what happens in terms of physical operations that occur when inserting a row into a table with a clustered index, and the insert isn't happening at the end of the row. If multiple gaps are left throught a page, then an insert has minimal impact (at least until a page split) as the number of rows that may need to be moved to accomodate the insert is minimised. If the gap is in one big chunk in the table, then the overhead to juggle the rows around would (in theory at least) be significantly more.
If someone knows an MSDN reference, point me to it please! I can't find one at the moment (still looking though). From what I've read it's implied that it's many gaps - but this doesn't seem to be explicitly stated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
来自
MSDN
:并且,进一步:
SQL Server
的数据页由以下元素组成:页头
:96
字节,固定。数据
:变量行偏移数组
:变量。行偏移数组始终存储在页面的末尾并向后增长。
数组的每个元素都是
2
字节值,保存页面内每行开头的偏移量。行在数据页内没有排序:相反,它们的顺序(在集群存储的情况下)由行偏移数组确定。这是已排序的行偏移量。
说,如果我们将一个簇键值为
10
的100
字节行插入到一个簇表中,并且它进入一个空闲页面,它会按如下方式插入:比方 在同一页面中插入一个新行,这次的簇键值为
9
:该行在逻辑上前置,但在物理上附加。
偏移数组被重新排序以反映行的逻辑顺序。
鉴于此,我们可以很容易地看到行从页面的开头开始附加到可用空间,而指向行的指针附加到可用空间从页面末尾开始。
From
MSDN
:and, further:
SQL Server
's data page consists of the following elements:Page header
:96
bytes, fixed.Data
: variableRow offset array
: variable.The row offset array is always stored at the end of the page and grows backwards.
Each element of the array is the
2
-byte value holding the offset to the beginning of each row within the page.Rows are not ordered within the data page: instead, their order (in case of clustered storage) is determined by the row offset array. It's the row offsets that are sorted.
Say, if we insert a
100
-byte row with cluster key value of10
into a clustered table and it goes into a free page, it gets inserted as following:Then we insert a new row into the same page, this time with the cluster key value of
9
:The row is prepended logically but appended physically.
The offset array is reordered to reflect the logical order of the rows.
Given this, we can easily see that the rows are appended to the free space, starting from the beginning on the page, while pointers to the rows are prepended to the free space starting from the end of the page.
这是我第一次想到这个,我对这个结论并不肯定,但是,
既然SQL Server在一次Read IO中可以检索到的最小数据量是一个完整的数据页,为什么单个页面中的任何行是否需要首先进行排序?我敢打赌它们不是,所以即使最后的间隙全部在一个大间隙中,新记录也可以添加到最后,无论这是否是正确的排序顺序。 (如果没有理由首先对页面上的记录进行排序)
其次,考虑 IO 的写入端,我认为最小的写入块也是整个页面,(即使是最小的更改也需要整个页面)页被写回磁盘)。这意味着每次写入页面时,页面上的所有行都可以在内存中进行排序,因此即使您插入到 dingle 页面上已排序的行集的开头,整个页面也会被读出,新记录可以插入到内存中集合中正确的插槽中,然后整个新排序的页面被写回磁盘......
This is the first time I've thought of this, and I'm not positive about the conclusion, but,
Since the smallest amount of data that can be retrieved by SQL Server in a single Read IO is one complete page of data, why would any of the rows within a single page need to be sorted in the first place? I'd bet that they're not, so that even if the gap is all in one big gap at the end, new records can be added at the end regardless of whether that's the right sort order. (if there's no reason to sort records on a page in the first place)
And, secondly, thinking about the write side of thge IO, I think the smallest write chunk is an entire page as well, (even the smallest change requires the entire page be written back to disk). This means that all the rows on a page could get sorted in memory every time the page is written to, so even if you were inserting into the beginning of a sorted set of rows on a dingle page, the whole page gets read out, the new record could be inserted into it's proper slot in the set in memory, and then the whole new sorted page gets written back to disk...