ScylladB中多个学流重叠的可能性是多少?
在开源版本中,Scylla建议免费提供多达50%的磁盘空间以进行“压实”。同时,文档指出每个表是彼此独立压实的。从逻辑上讲,这表明在具有数十个(甚至多个)表的应用程序中,只有很小的机会将如此多的压实重合。
是否有一个数学模型来计算多个表格中多个压实如何重叠?基于粗略的分析,似乎多个重叠压实的可能性很小,尤其是当我们处理数十个独立表时。
In the open source version, Scylla recommends keeping up to 50% of disk space free for “compactions”. At the same time, the documentation states that each table is compacted independently of each other. Logically, this suggests that in a applications with dozens (or even multiple) tables there’s only a small chance that so many compaction will coincide.
Is there a mathematical model of calculating how multiple compaction might overlap in an application with several tables? Based on a cursory analysis, it seems that the likelihood of multiple overlapping compaction is small, especially when we are dealing with dozens of independent tables.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您绝对正确:
使用尺寸层的压实策略压实可能会暂时翻倍磁盘要求。但这并不是整个磁盘要求的两倍,而只有此压实涉及的sstable (另请参见我的博客文章 size-tiered压实及其空间放大)。确实存在“整个磁盘用法”与“与此压实相关的sstables”之间的区别,原因有两个:
第二个原因不能依靠 - 因为碎片选择何时独立紧凑,如果您不幸的话,所有碎片可能会决定完全同时紧凑同一表,而且更糟糕的是,可能会碰巧在同一时完成最大的压实时间。如果您启动“主要压实”(Nodetool Compact),则这种“不幸”也可能以100%的概率发生。
您询问的问题的第一个原因确实更为有用和可靠:除了所有碎片都不太可能选择压缩所有Sstables的情况外,Scylla的压实算法中有一个重要的细节,在这里有所帮助:每个碎片一次仅一次(大致)给定尺寸的一个压实。因此,如果您有许多大致相等的桌子,那么一次不可能一次完成其中一张桌子的全部压实。这是可以保证的 - 这不是概率问题。
当然,这种“技巧”只有在您确实拥有许多大致相等的表格时才有帮助。如果一张桌子比其余的大得多,或者桌子的尺寸非常不同,那么它不会帮助您控制最大的临时磁盘使用情况。
在问题中 https://github.com/scylladb/scylladb/scylla/scylla/sissues/issues/2871 Scylla如何保证的想法,即当磁盘空间较低时,碎片(点1)也用于减少临时磁盘空间的使用情况。我们没有实施这个想法,而是实现了一个更好的想法 - “增量压实策略”,该策略在零件(“增量”)中进行巨大的压实,以避免大多数临时磁盘用法。参见此博客文章这种新的压实策略有效,并绘制了它如何降低临时磁盘使用情况的图表。请注意,增量压实策略当前是Scylla Enterprise版本的一部分(不在开源版本中)。
You're absolutely right:
With the size-tiered compaction strategy a compaction may temporarily double the disk requirements. But it doesn't double the entire disk requirements but only of the sstables involved in this compaction (see also my blog post on size-tiered compaction and its space amplification). There is indeed a difference between "the entire disk usage" and just "the sstables involved in this compaction" for two reasons:
The second reason cannot be counted on - since shards choose when to compact independently, if you're unlucky all shards may decide to compact the same table at exactly the same time, and worse - may happen to do the biggest compactions all at the same time. This "unluckiness" can also happen at 100% probability if you start a "major compaction" (nodetool compact).
The first reason, the one which you asked about, is indeed more useful and reliable: Beyond it being unlikely that all shards will choose to compact all sstables are exactly the same time, there is an important detail in Scylla's compaction algorithm which helps here: Each shard only does one compaction of a (roughly) given size at a time. So if you have many roughly-equal-sized tables, no shard can be doing full compaction of more than one of those tables at a time. This is guaranteed - it's not a matter of probability.
Of course, this "trick" only helps if you really have many roughly-equal-sized tables. If one table is much bigger than the rest, or tables have very different sizes, it won't help you too much to control the maximum temporary disk use.
In issue https://github.com/scylladb/scylla/issues/2871 I proposed a idea of how Scylla can guarantee that when disk space is low, the sharding (point 1) is also used to reduce temporary disk space usage. We haven't implemented this idea, but instead implemented a better idea - "incremental compaction strategy", which does huge compactions in pieces ("incrementally") to avoid most of the temporary disk usage. See this blog post for how this new compaction strategy works, and graphs demonstrating how it lowers the temporary disk usage. Note that Incremental Compaction Strategy is currently part of the Scylla Enterprise version (it's not in the open-source version).