VertiPaq 是什么及其工作原理

发布于 2024-12-02 13:09:57 字数 152 浏览 4 评论 0原文

我正在学习列存储索引(Denali CTP3 功能)并了解到它使用 VertiPaq 架构进行数据压缩。我很想知道它是什么、它是如何工作的、它的架构。我查了谷歌但没有满意的结果。任何人都可以让我详细了解它是什么、它是如何工作的、其背后的算法/架构等。

以及它如何帮助数据压缩

I am learning the Column Store index (Denali CTP3 feature) and got to know that it uses VertiPaq architecture for data compression. I got interested to know as what it is , how it works, it's architecture. I checked in google but no satisfactory result. Could any one please let me know in detail as what it is , how it works , the algo/architecture behind this etc.

And how it helps in data compression

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

深空失忆 2024-12-09 13:09:57

我就此写了一篇博文,希望能够回答您有关列存储索引的问题:
http://www.jamesserra.com/archive/2011/08/sql-server-%e2%80%9cdenali%e2%80%9d-project-apollo/

请让我知道您是否还有疑问。

I wrote a blog post on this that hopefully will answer your questions on column store indexes:
http://www.jamesserra.com/archive/2011/08/sql-server-%e2%80%9cdenali%e2%80%9d-project-apollo/

Please let me know if you still have questions.

回梦 2024-12-09 13:09:57

以及它如何帮助数据压缩

压缩部分工作得很好,因为同一列中的数据通常变化不大。例如,想象一下(简化)一个存储来自多 (4) 个选择输入的值的列。即使表中有 800 万条记录,列存储中也只有 4 个唯一值。这使得列值更容易压缩。这反过来又使得索引更容易装入内存,从而查询速度更快。

当数据以列方式存储时,数据通常可以
比以行方式存储时压缩更有效。
通常,列内的冗余多于行内的冗余,
这通常意味着数据可以被压缩到更大程度。
当数据压缩程度更高时,获取数据所需的 IO 更少
进入记忆。此外,大部分数据可以驻留在
给定的内存大小。减少IO可以显着加快查询速度
响应时间。在内存中保留更多工作数据集
将加快访问后续查询的响应时间
相同的数据。

来源: 有关列存储技术的更多详细信息

And how it helps in data compression

The compression part works so well because very often data in the same column doesn't vary much. Imagine for example (simplification) a column that stores values from a multiple (4) choice input. There are going to be just 4 unique values in the column store, even if there are 8 million records in the table. That makes the column values easier to compress. That in turn makes it easier to fit the index into memory and thus faster to query.

When data is stored in column-wise fashion, the data can often be
compressed more effectively than when stored in row-wise fashion.
Typically there is more redundancy within a column than within a row,
which usually means the data can be compressed to a greater degree.
When data is more compressed, less IO is required to fetch the data
into memory. In addition, a larger fraction of the data can reside in
a given size of memory. Reducing IO can significantly speed up query
response time. Retaining more of your working set of data in memory
will speed up response time for subsequent queries that access the
same data.

Source: More details on columnstore technology

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文