寻找 ROW_OVERFLOW_DATA 如何发生的精确性

发布于 2024-11-08 01:59:22 字数 1381 浏览 5 评论 0原文

我目前正处于计划重写 CRM 应用程序中的大型模块的初始阶段。

我目前正在研究的一个领域是数据库优化，我还没有做出任何决定，但我只是想确保我正确理解 ROW_OVERFLOW_DATA 的概念 - http://msdn.microsoft.com/en-us/library/ms186981.aspx

我们使用的是 SQL Server 2005，据我了解，行大小限制为 8,060 字节，超过此限制将会发生溢出。

我运行了一个查询来获取特定读取密集型数据库的最大行大小，

SELECT OBJECT_NAME (sc.[id]) tablename
, COUNT (1) nr_columns
, SUM (sc.length) maxrowlength
FROM syscolumns sc
join sysobjects so
on sc.[id] = so.[id]
WHERE so.xtype = 'U'
GROUP BY OBJECT_NAME (sc.[id])
ORDER BY SUM (sc.length) desc

这给了我一些 maxrowlength 略高于 8,000 但低于 10,000 的表。另一个查询显示平均行大小实际上非常小，大约 1,000 字节。

我的问题是： ROW_OVERFLOW_DATA 是基于每行还是每列？一旦扩展了 8,060 字节限制，导致溢出的整个列是否会移至另一页，还是只是特定行？

例如，给出以下简化模式：

col1 (int) | col 2 (varchar (4000)) | col 3(varchar(5000))
    1      |    4000 characters   |    5000 characters ***This row is overflowing
    2      |    4000 characters   |    100 characters
    3      |    150 characters    |    150 characters
    4      |    500 characters    |    600 characters

第 1 行到第 4 行的每个第 3 列都会被 24 字节指针替换还是仅被 rowID 1 替换？

我想知道，因为如果每一行都有一个指针，那么修复它就变得很重要，如果只有几行，也许我们会受到性能影响。

另外，我看到许多博客建议将可为空的列移到数据库的末尾，这样如果值实际上为 NULL，它们就不会占用任何行空间。这是真的吗？我们倾向于将时间戳和跟踪列保留在最后，因为它更容易可视化。现在我想知道我们是否不应该将它们进一步移动，因为它们永远不会为空。

原文

I'm currently in the initial phases of planning a rewrite for a large module in our CRM application.

One area I am currently looking into is database optimization, I haven't made any decision yet but I just want to make sure I understand the concept of ROW_OVERFLOW_DATA properly - http://msdn.microsoft.com/en-us/library/ms186981.aspx

We are using SQL server 2005, it's my understanding that the row size limit is 8,060 bytes and that after that overflow will occur.

I ran a query to get my max row size for a particular read intensive database

SELECT OBJECT_NAME (sc.[id]) tablename
, COUNT (1) nr_columns
, SUM (sc.length) maxrowlength
FROM syscolumns sc
join sysobjects so
on sc.[id] = so.[id]
WHERE so.xtype = 'U'
GROUP BY OBJECT_NAME (sc.[id])
ORDER BY SUM (sc.length) desc

This gave me a few tables with a maxrowlength that was sligtly above 8,000, but under 10,000. Another query shows that the average row size is actually quite small, around 1,000 bytes.

My question is: is ROW_OVERFLOW_DATA based on each row or is it per column? Once the 8,060 bytes limit is expanded is the entire column that caused it to overflow moved to another page or is it only the specific row?

So for example given the following simplified schema:

col1 (int) | col 2 (varchar (4000)) | col 3(varchar(5000))
    1      |    4000 characters   |    5000 characters ***This row is overflowing
    2      |    4000 characters   |    100 characters
    3      |    150 characters    |    150 characters
    4      |    500 characters    |    600 characters

Would every the col 3 of row 1 to 4 get replaced by a 24 bytes pointer or only rowID 1?

I am wondering cause if it's every row gets a pointer it becomes important to fix it, if it's only a few rows maybe we can take the performance hit.

Also, I've seen many blogs suggesting to move nullable columns toward the end of the database so that if the values are in fact NULL they don't take any row space. Is this true? We tend to keep our timestamp and tracking columns at the end cause it's easier to visualize. Now I am wondering if maybe we shouldn't move them further up as they are never NULL.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

野侃 2024-11-15 01:59:22

如果有一亿行数据溢出，您会移动整列吗？否。

仅供参考，来自 Paul Randal 的 technet 文章是这个东西的上帝（我的大胆）

您使用的行溢出功能非常适合允许偶尔的行超过 8,060 字节，但它不太适合大多数行 strong> 过大可能会导致查询性能下降，正如您所经历的那样。
这样做的原因是，当一行即将变得过大时，该行中的一个可变长度列会被推到“行外”。这意味着该列取自数据或索引页上的行并移动到文本页。替换旧列值的是一个指向数据文件中列值的新位置的指针。

和 MSDN （我的粗体）

ROW_OVERFLOW_DATA 分配单元
对于表（堆或聚集表）、索引或索引视图使用的每个分区，都有一个 ROW_OVERFLOW_DATA 分配单元。此分配单元包含零 (0) 个页，直到 IN_ROW_DATA 分配单元中具有可变长度列（varchar、nvarchar、varbinary 或 sql_variant）的数据行超过 8 KB 行大小限制。当达到大小限制时，SQL Server 将最大宽度的列从该行移动到 ROW_OVERFLOW_DATA 分配单元中的页面。指向该行外数据的 24 字节指针保留在原始页上。

至于您的 NULLable 列，这是错误的。无论表定义中的列顺序如何，可为 NULL 的列都存储在磁盘结构的末尾。以及来自的参考Paul Randal：存储引擎内部：再次剖析记录。任何以前的答案我在这里