The documentation regarding the buffer table engine for a replicated destination table contains the following warning:
https://clickhouse.com/docs/en/engines/table-engines/special/buffer/
"If the destination table is replicated, some expected characteristics of replicated tables are lost when writing to a Buffer table. The random changes to the order of rows and sizes of data parts cause data deduplication to quit working, which means it is not possible to have a reliable ‘exactly once’ write to replicated tables."
From my understanding of how replicated tables apply block-level deduplication, (*) this would imply that writes will occur at least once.
Is this correct? Or is there a possibility that writes might be lost under rare circumstances?
(*)
https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/
https://kb.altinity.com/altinity-kb-schema-design/insert_deduplication/
发布评论
评论(1)
您将使用缓冲表丢失两件事 - 当客户端插入完全相同的块时,一个是“自动”重复数据删除。如果客户认为写入实际上成功并尝试重写同一批次时,这允许复制数据的可能性。通常可以在客户级进行管理。
另一个是如果Clickhouse服务器发生某些事情,而数据仅在满足潮红条件之前的“内存”缓冲区表中,则可能会丢失数据。
insert_quorum
仅适用于replicatemergetree表,而不是缓冲表,因此,直到齐平时,存储器中只有一份数据副本已被确认为“书面”,但尚未存储在磁盘或复制。使用缓冲表意味着如果服务器出于某种原因崩溃,则接受此数据丢失的可能性。You lose two things with buffer tables -- one is "automatic" deduplication when the client inserts exactly the same block more than once. This allows for the possibility of duplicate data if the client believes a write has failed when it was in fact successful and attempts to rewrite the same batch. That can normally be managed at the client level.
The other is the possibility of data loss if something happens to the ClickHouse server while data is only in the "in memory" buffer table before one of the flush conditions is met.
insert_quorum
only applies to ReplicateMergeTree tables, not Buffer tables, so until the flush there is only one copy of the data in memory that has already been acknowledged to client as "written" but has not yet been stored on disk or replicated. Using buffer tables means accepting the possibility of this data loss if the server crashes for some reason.