点击室复制表块级置台用缓冲区表重复数据删除

发布于 2025-02-11 23:22:54 字数 725 浏览 2 评论 0 原文

有关复制目标表的缓冲表引擎的文档包含以下警告:

https://clickhouse.com/docs/en/engines/table-engines/special/buffer/

”如果重复了目标表。

从我对复制表如何应用块级重复数据删除的理解,(*)这意味着写作将至少一次进行

这是正确的吗?还是有可能在极少数情况下写作可能会丢失?

(*)

https://kb.altinity.com/altinity-kb-schema-design/insert_deduplication/

The documentation regarding the buffer table engine for a replicated destination table contains the following warning:

https://clickhouse.com/docs/en/engines/table-engines/special/buffer/

"If the destination table is replicated, some expected characteristics of replicated tables are lost when writing to a Buffer table. The random changes to the order of rows and sizes of data parts cause data deduplication to quit working, which means it is not possible to have a reliable ‘exactly once’ write to replicated tables."

From my understanding of how replicated tables apply block-level deduplication, (*) this would imply that writes will occur at least once.

Is this correct? Or is there a possibility that writes might be lost under rare circumstances?

(*)

https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/

https://kb.altinity.com/altinity-kb-schema-design/insert_deduplication/

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

中性美 2025-02-18 23:22:54

您将使用缓冲表丢失两件事 - 当客户端插入完全相同的块时,一个是“自动”重复数据删除。如果客户认为写入实际上成功并尝试重写同一批次时,这允许复制数据的可能性。通常可以在客户级进行管理。

另一个是如果Clickhouse服务器发生某些事情,而数据仅在满足潮红条件之前的“内存”缓冲区表中,则可能会丢失数据。 insert_quorum 仅适用于replicatemergetree表,而不是缓冲表,因此,直到齐平时,存储器中只有一份数据副本已被确认为“书面”,但尚未存储在磁盘或复制。使用缓冲表意味着如果服务器出于某种原因崩溃,则接受此数据丢失的可能性。

You lose two things with buffer tables -- one is "automatic" deduplication when the client inserts exactly the same block more than once. This allows for the possibility of duplicate data if the client believes a write has failed when it was in fact successful and attempts to rewrite the same batch. That can normally be managed at the client level.

The other is the possibility of data loss if something happens to the ClickHouse server while data is only in the "in memory" buffer table before one of the flush conditions is met. insert_quorum only applies to ReplicateMergeTree tables, not Buffer tables, so until the flush there is only one copy of the data in memory that has already been acknowledged to client as "written" but has not yet been stored on disk or replicated. Using buffer tables means accepting the possibility of this data loss if the server crashes for some reason.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文