Debezium Server for Azure 事件中心接收器将消息发送到多个分区键
我正在为 PostgreSQL azure 数据库实施 CDC。我希望将事件发送到 azure 事件中心。我当前的计划是使用 Debezium Server 和事件中心接收器来执行此操作。但是我想按表强制执行事件顺序。从这篇文章我知道我可以为此,单个主题具有多个分区,但每次仅将事件从单个表发送到特定分区。
然而,Debezium 似乎没有提供一个很好的方法来处理这个问题。您可以为要发送到的所有事件指定分区键,但不能为每个事件动态指定。我看到的唯一可以解决此问题的另一件事是自定义接收器实现或传递到配置中的自定义 EventHubProducerClient 实现。
我有什么选择来处理这个问题?是否有另一种方法来构建此解决方案,以便我不必使用分区键?或者自定义接收器实现将是我最好的选择?或者我应该放弃 debezium 并编写一个自定义侦听器/发布器?
上下文/要求
- 通常要运行 debezium,您需要运行一个 kafka 实例,但是 如果可能的话我不想使用kafka,因为我已经计划好了 使用事件中心,这似乎是两面派,而且它是另一个需要维护的服务。
- 当事件中心的使用者读取事件时,按表对事件进行 FIFO 排序,
- 所有逻辑数据库更改都将转换为事件,
- 团队中没有 Java 开发人员,因此自定义 (Java) 实现将扩展我们的专业知识。
I'm implementing CDC for a PostgreSQL azure database. And I want the events to be sent to azure event hub. My current plan is to use Debezium Server with the Event Hub sink to do this. However I want to enforce order of events by table. From this article I know I can do this by having a single topic with multiple partitions but only sending events from a single table to a specific partition every time.
However it seems like debezium doesn't provide a nice way to handle this. You can specify the partition key for all events to be sent to, but not dynamically per event. The only other thing I saw that could solve this is a custom sink implementation or a custom EventHubProducerClient implementation passed into the config.
What are my options for handling this? Is there another way to architect this solution so that I don't have to use partition keys? Or is a custom sink implementation going to be my best bet? Or should i just drop debezium and write a custom listener/publisher?
Context / requirements
- typically to run debezium you need a kafka instance running, however
if possible I don't want to use kafka as I'm already planning on
using event hub, and it seems duplicitous, and it is another service that needs to be maintained. - FIFO ordering of events by table when read by consumers of the event hub
- all logical database changes are turned into events
- no java developers on the team so the custom (java) implementations will be a stretch to our expertise.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
配置示例:
partition.payload.fields
设置 (参见文档) 确定跨分区分发事件的哈希函数应使用事件中的哪个字段。source.table
将是不带架构的表名称,因此本例中为TableOne
/TableTwo
/TableThree
。根据上述配置,来自
TableOne
/TableTwo
/TableThree
的所有事件都将被发送到一个分区(可能发送到三个不同的分区,每个分区一个)桌子)。因此,仅使用配置的 32 个分区中的 3 个。如果设置是
fields=source.table,source.table,change.Id
那么来自TableOne
和TableTwo
的所有事件将被发送到它们自己的分区,而TableThree
中的事件将在所有 32 个分区之间平均分配(但特定行的所有事件将始终发送到同一分区)。Example configuration:
The
partition.payload.fields
setting (see docs) determines which field in the event should be used by the hash function that distributes events across partitions.source.table
will be the table name without the schema, soTableOne
/TableTwo
/TableThree
in this example.Given the above configuration, all events from
TableOne
/TableTwo
/TableThree
will be sent to exactly one partition (probably to three different ones, one per table). So only 3 of the configured 32 partitions would be used.If the setting was
fields=source.table,source.table,change.Id
then all events fromTableOne
andTableTwo
would be sent to their own partitions, while the events fromTableThree
would be evenly divided between all the 32 partitions (but all events for a particular row would always be sent to the same partition).