表中数据的顺序会影响其性能吗? RLE压缩
我使用 Greenplum 数据库 - 大规模多并行 Postgres。 我有一张 100 GB 的表。
有2019年至今的数据。该表没有排序,但每天我们都会插入新数据。所以它有点按销售日排序。 我想重新创建此表,但我想在插入之前对数据进行排序。 该表当前使用 Quicklz 压缩进行压缩,我们使用列存储压缩。按特定键排序应该是有益的,因为 Greenplum 使用 RLE。相同的值将存储在一起。
通过重新创建表格,我希望回收一些空间。 这会对性能产生影响吗?
I use the Greenplum database - massively multi-parallel Postgres.
I have a table which that has 100 gb.
There is data from 2019 up to today. The table is not ordered, but every day we insert new data. So it's kinda sorted by a sales day.
I would like to recreate this table, but I would like to sort the data before the insert.
The table is currently compressed with a quicklz compression and we use the column store compression. Sorting by a specific key should be beneficial because Greenplum uses RLE. The same values will be stored together.
By recreating the table I hope to reclaim some space.
Would this have any impact on the performance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我们在 Greenplum 中有一个新的压缩,可能比 Quicklz 更好;您可以尝试 zStandard 压缩。按列排序肯定可以使压缩更多。
关于性能,这取决于您当前的瓶颈是什么。
一般来说,更多的压缩是一件好事,但如果你想获得性能,其他因素可能更重要
we have a new compression in Greenplum that could be better in than quicklz; you can try zStandard compression. Sorting by column can definitely make compression more.
In regards to performance it depends what your current bottlenecks are.
In general more compression is a good thing, but if you want to get performance other factors may be more important
使用 RLE(也在内部应用增量压缩)绝对对您的表有益。理想情况下,查询的性能应该会更好,因为更好的压缩比会减少 IO。
Using RLE (which also applies delta compression internally) would be definitely beneficial for your table. Performance should ideally get better for queries as reduced IO would be performed due to a better compression ratio.