使用Abris for Spark的Confluent Magic Bytes自动解决模式

发布于 2025-02-09 03:43:42 字数 1407 浏览 1 评论 0 原文

是否有一种方法可以通过针对每个消息的领先魔术字节自动解决模式，其中包含该消息的架构ID？，

我们知道， abris添加了此ID （魔术字节）编码架构以支持该架构以支持该模式汇合格式。

但是，在解码汇合编码的消息时，我们必须事先手动通过模式配置，从而很难支持我们可以接收不同架构的消息的情况（想想记录策略甚至是模式的Evolution）。

i可以在每个消息字节上手动解析解决方案，并动态构建每个消息的架构配置（或一组消息，以保存并避免创建百万个配置对象）。

我想念的是这样的东西吗？

作为示例用例，假设在微批次中，我们使用 record neCemestrategy ，在同一批次中，我们收到具有不同模式的不同消息，并且对于同一模式主题：

消息值	模式的版本略有不同主题	模式版本	架构ID 嵌入在消息魔术字节
`* confluent-avro *`	my.record.type1	v1	1
`* confluent-avro-avro *`	my.record.type1	v2	2
`* confluent-avro * ***`	my.record.type2	v1	3
`* confluent-avro * < /code>`	my.record.type2	v2	4

非常感谢您。

原文

Is there a way to automatically resolve the schema by the leading magic byte for each message, which contains the schema id for that message?

As we know, Confluent AVRO prepends the schema id to the message. So, each message has its own schema id embedded in it.
ABRIS adds this id (the magic byte) when encoding the schema to support the Confluent format.

However, when decoding Confluent encoded messages, we must manually pass the schema configuration beforehand, making it hard to support cases where we can receive messages with different schemas (think of record strategy or even schema evolution).

I could implement a solution parsing manually each message magic byte and dynamically constructing the schema configurations for each message (or group of messages to save and avoid creating million of config objects).

Is there such a thing out of the box that I miss?

As an example use case, suppose in a micro-batch we read from a topic using RecordNameStrategy and in the same batch we receive different messages with different schemas and also slightly different versions for the same schema subject:

message value	schema subject	schema version	schema id embedded in the message magic byte
`*confluent-avro*`	my.record.type1	v1	1
`*confluent-avro*`	my.record.type1	v2	2
`*confluent-avro*`	my.record.type2	v1	3
`*confluent-avro*`	my.record.type2	v2	4