Debezium Server for Azure 事件中心接收器将消息发送到多个分区键

发布于 2025-01-09 19:28:39 字数 778 浏览 1 评论 0原文

我正在为 PostgreSQL azure 数据库实施 CDC。我希望将事件发送到 azure 事件中心。我当前的计划是使用 Debezium Server 和事件中心接收器来执行此操作。但是我想按表强制执行事件顺序。从这篇文章我知道我可以为此,单个主题具有多个分区,但每次仅将事件从单个表发送到特定分区。

然而,Debezium 似乎没有提供一个很好的方法来处理这个问题。您可以为要发送到的所有事件指定分区键,但不能为每个事件动态指定。我看到的唯一可以解决此问题的另一件事是自定义接收器实现或传递到配置中的自定义 EventHubProducerClient 实现。

我有什么选择来处理这个问题?是否有另一种方法来构建此解决方案,以便我不必使用分区键?或者自定义接收器实现将是我最好的选择?或者我应该放弃 debezium 并编写一个自定义侦听器/发布器?

上下文/要求

  • 通常要运行 debezium,您需要运行一个 kafka 实例,但是 如果可能的话我不想使用kafka,因为我已经计划好了 使用事件中心,这似乎是两面派,而且它是另一个需要维护的服务。
  • 当事件中心的使用者读取事件时,按表对事件进行 FIFO 排序,
  • 所有逻辑数据库更改都将转换为事件,
  • 团队中没有 Java 开发人员,因此自定义 (Java) 实现将扩展我们的专业知识。

I'm implementing CDC for a PostgreSQL azure database. And I want the events to be sent to azure event hub. My current plan is to use Debezium Server with the Event Hub sink to do this. However I want to enforce order of events by table. From this article I know I can do this by having a single topic with multiple partitions but only sending events from a single table to a specific partition every time.

However it seems like debezium doesn't provide a nice way to handle this. You can specify the partition key for all events to be sent to, but not dynamically per event. The only other thing I saw that could solve this is a custom sink implementation or a custom EventHubProducerClient implementation passed into the config.

What are my options for handling this? Is there another way to architect this solution so that I don't have to use partition keys? Or is a custom sink implementation going to be my best bet? Or should i just drop debezium and write a custom listener/publisher?

Context / requirements

  • typically to run debezium you need a kafka instance running, however
    if possible I don't want to use kafka as I'm already planning on
    using event hub, and it seems duplicitous, and it is another service that needs to be maintained.
  • FIFO ordering of events by table when read by consumers of the event hub
  • all logical database changes are turned into events
  • no java developers on the team so the custom (java) implementations will be a stretch to our expertise.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

可遇━不可求 2025-01-16 19:28:39

配置示例:

debezium.source.table.include.list=dbo.TableOne,dbo.TableTwo,dbo.TableThree
debezium.source.transforms=PartitionRouting
debezium.source.transforms.PartitionRouting.type=io.debezium.transforms.partitions.PartitionRouting
debezium.source.transforms.PartitionRouting.partition.payload.fields=source.table,source.table,source.table
debezium.source.transforms.PartitionRouting.partition.topic.num=32

partition.payload.fields 设置 (参见文档) 确定跨分区分发事件的哈希函数应使用事件中的哪个字段。 source.table 将是不带​​架构的表名称,因此本例中为 TableOne/TableTwo/TableThree

根据上述配置,来自 TableOne/TableTwo/TableThree 的所有事件都将被发送到一个分区(可能发送到三个不同的分区,每个分区一个)桌子)。因此,仅使用配置的 32 个分区中的 3 个。

如果设置是
fields=source.table,source.table,change.Id 那么来自 TableOneTableTwo 的所有事件将被发送到它们自己的分区,而TableThree 中的事件将在所有 32 个分区之间平均分配(但特定行的所有事件将始终发送到同一分区)。

Example configuration:

debezium.source.table.include.list=dbo.TableOne,dbo.TableTwo,dbo.TableThree
debezium.source.transforms=PartitionRouting
debezium.source.transforms.PartitionRouting.type=io.debezium.transforms.partitions.PartitionRouting
debezium.source.transforms.PartitionRouting.partition.payload.fields=source.table,source.table,source.table
debezium.source.transforms.PartitionRouting.partition.topic.num=32

The partition.payload.fields setting (see docs) determines which field in the event should be used by the hash function that distributes events across partitions. source.table will be the table name without the schema, so TableOne/TableTwo/TableThree in this example.

Given the above configuration, all events from TableOne/TableTwo/TableThree will be sent to exactly one partition (probably to three different ones, one per table). So only 3 of the configured 32 partitions would be used.

If the setting was
fields=source.table,source.table,change.Id then all events from TableOne and TableTwo would be sent to their own partitions, while the events from TableThree would be evenly divided between all the 32 partitions (but all events for a particular row would always be sent to the same partition).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文