SQL 2008 字典数据压缩 - 这只适用于整个列匹配吗?
为简单起见,假设我每行有一个 varchar(50),这些是我的行:
1) 敏捷的棕色狐狸跳过了懒狗 2)我喜欢棕色狐狸 3)狐狸 4)fox
因此页面压缩会找到例如单词fox并将其放入字典中。是否只有第 3 行和第 4 行受益,而第 1 行和第 2 行则不会受益,因为它们还包含其他数据?这似乎是经验上正在发生的事情。我有一堆具有高度重复段(尖括号、重复属性名称等)的 XML 数据,如果我将其保存到 CSV 并将其压缩为控制案例,这些数据的压缩效果会非常好,但在 SQL Server 中我得到的压缩率几乎为 0%。我已经通过查看 sys.dm_db_index_physical_stats page_count 和compressed_page_count 确认确实发生了一些压缩,因此我知道我的结构中没有任何内容本身会阻止压缩(例如太大的行)。
这里的示例仅包含整个列而不包含子集,因此我要明确询问。
For simplicity suppose I have one varchar(50) per row and these are my rows:
1) The quick brown fox jumps over the lazy dog
2) I like brown foxes
3) fox
4) fox
So the page compression would find the word fox for instance and put it in the dictionary. Would only rows 3 and 4 benefit and rows 1 and 2 would not because they contain other data as well? This is what seems to be happening empirically. I have a bunch of XML data with highly repeating segments (angle brackets, repeated attribute names, etc) which compresses fantastically if I save it out to a CSV and zip it as a control case but I get almost 0% compression in SQL server. I've confirmed that SOME compression does take place by looking at sys.dm_db_index_physical_stats page_count and compressed_page_count so I know there's nothing in my structure inherently preventing compression altogether like too large rows.
The example here only includes whole columns and not subsets hence why I am asking explicitly.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
请参阅 页面压缩实现
据我了解,值“fox”将被放置在字典中,并且字典值将替换页面上出现的所有,因此所有行都应该受到影响。
您的数据是存储在 (n)varchar 列中还是 XML 列中?
See Page Compression Implementation
As I understand, the value 'fox' will be placed in the dictionary and the dictionary value will replace all occurences on the page, so all rows should be affected.
Is your data stored in (n)varchar columns or in XML columns?
我忘了提及,除了我所做的实证测试之外,我还向 Microsoft 确认了这一点。是的,字典数据压缩仅适用于整个列或至少前缀匹配。
I forgot to mention that I confirmed that in addition to the empirical testing I did I confirm this with Microsoft. Yes, Dictionary Data Compression is for WHOLE column or at least PREFIX matches only.