Scylla DB潜伏期每15天增加一次
通常,读取延迟约为3-4毫秒,但是在每15-20天后,潜伏期的射击最高可达100-150毫秒,因此,我需要重新启动整个集群。阅读修复每午夜运行。我无法弄清楚这个问题。
我有一个8节点Scylla群集(版本4.1)。所有节点都在同一数据中心内。在那,我有5个关键空间。
在3个键空间中的5个键空间中,我在压缩它后正在存储JSON BLOB,并且在 cl = local_quorum上查询读取和写入。对于这些键空间,读取比率相当相同,并且为插入的每个记录设置了6个月的TTL。每个键空间只有1个表。
对于其他2个键空间,我正在存储一些内部配置,并且我正在使用 cl = all编写表格,并使用cl = local_one 读取它们。读写比为10:1,并且为插入的记录设置了未设置的TTL。每个钥匙空间都有大约5-8个表。
以下是所有表/键空间的配置:
Replciation factor = 3,
compaction = {'class': 'SizeTieredCompactionStrategy'},
compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'},
crc_check_chance = 1.0,
dclocal_read_repair_chance = 0.1,
default_time_to_live = 0,
gc_grace_seconds = 864000,
max_index_interval = 2048,
memtable_flush_period_in_ms = 0,
min_index_interval = 128,
read_repair_chance = 0.0,
speculative_retry = '99.0PERCENTILE'
在下图中,我于5月5日重新启动了集群,从5月5日到20月20日,读取延迟约为3-4 ms,但从5月20日起,它开始以指数的方式和On上的增长第23届达到100-150毫秒时,我重新启动了集群,并且阅读延迟恢复正常。
我怀疑以下图表,延迟的增加可能是由于读取可能会引起的。那个时期的磁盘。但是量子很少。
Usually, the read latency is around 3-4 ms but after every 15-20 days latency shoots up to 100-150 ms, and because of this I need to restart our whole cluster. The read-repair runs every midnight. I am unable to figure out the issue.
I have an 8-node Scylla cluster (version 4.1). All nodes are within the same data center. In that, I have 5 key spaces.
Out of 5 keyspaces in 3 keyspaces, I am storing JSON blob after compressing it and I am querying on CL = LOCAL_QUORUM for both reads and writes. For these keyspaces, the read and write ratio is fairly the same and 6months TTL is set for each record inserted. Each keyspace has only 1 table.
For the other 2 keyspaces, I am storing some internal configs and I am writing the tables using CL = ALL and reading them using CL = LOCAL_ONE. The read writes ratio is 10:1 and no ttl is set for the record inserted. Each keyspace has around 5-8 tables.
Below is the configuration of all tables/keyspace:
Replciation factor = 3,
compaction = {'class': 'SizeTieredCompactionStrategy'},
compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'},
crc_check_chance = 1.0,
dclocal_read_repair_chance = 0.1,
default_time_to_live = 0,
gc_grace_seconds = 864000,
max_index_interval = 2048,
memtable_flush_period_in_ms = 0,
min_index_interval = 128,
read_repair_chance = 0.0,
speculative_retry = '99.0PERCENTILE'
In below graph I restarted the cluster on 5th may and from 5th May till 20th May the read latency was around 3-4 ms but from 20th may onwards it started increasing in an exponential manner and on 23rd when it reached 100-150 ms I have restarted the cluster and read latency was back to normal.
I suspect seeing the below graph that the increase in latency might be due to the reads going to the disk in that time period. But the quantum is very less.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
读取维修始终运行,对于每个操作,如果复制品之间存在差异,请读取对旧的操作同步旧的,仅按分区为基础。
没有足够的数据要弄清楚,它可能是压实。最好在邮件列表或松弛上询问。您可以在此处上传压实和反应器图。
另外,这是一个过时的版本 - 移至4.6
Read repair runs all the time, for every operation, if there is a discrepancy between the replicas, read-repair syncs the old one, just on a partition basis.
There is not enough data to figure out, it is probably compaction. Best to ask on the mailing list or slack. You can upload the compaction and the reactor graphs here.
Also, it's an obsolete version - move to 4.6