当前位置：文江博客话题详情

从字节流中解析可变长度描述符并对其类型进行操作

发布于 2024-08-08 05:46:40 字数 138 浏览 7 评论 0原文

我正在从包含一系列可变长度描述符的字节流中读取数据，我在代码中将其表示为各种结构/类。每个描述符都有一个与所有其他描述符相同的固定长度标头，用于标识其类型。

是否有合适的模型或模式可以用来最好地解析和表示每个描述符，然后根据其类型执行适当的操作？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

羁客 2024-08-15 05:46:40

我已经编写了很多此类解析器。

我建议您读取固定长度标头，然后使用简单的 switch-case 将正确的构造函数分派到您的结构，将固定标头和流传递给该构造函数，以便它可以使用流的可变部分。

回复收藏 0 原文

信仰 2024-08-15 05:46:40

这是文件解析中的常见问题。通常，您读取描述符的已知部分（幸运的是在这种情况下是固定长度的，但并非总是如此），并将其分支到那里。一般来说，我在这里使用策略模式，因为我通常希望系统具有广泛的灵活性 - 但是直接开关或工厂也可以工作。

另一个问题是：你控制并信任下游代码吗？含义：工厂/策略实施？如果这样做，那么您只需为它们提供流和您期望它们消耗的字节数（也许放置一些调试断言，以验证它们是否读取了正确的数量）。

如果您不能信任工厂/策略实现（也许您允许用户代码使用自定义反序列化器），那么我将在流顶部构建一个包装器（示例：来自 protobuf-net 的 SubStream ），只允许消耗预期的字节数（之后报告 EOF），并且不允许在该块之外进行查找/等操作。我还会进行运行时检查（即使在发布版本中）是否已消耗足够的数据 - 但在这种情况下，我可能只会读取任何未读的数据 - 即，如果我们预计下游代码消耗 20 个字节，但它只读取 12 个字节，然后跳过接下来的 8 个并读取我们的下一个描述符。

对此进行扩展；这里的一个策略设计可能是这样的：

interface ISerializer {
    object Deserialize(Stream source, int bytes);
    void Serialize(Stream destination, object value);
}

您可以为每个预期标记构建此类序列化器的字典（或者只是一个列表，如果数量很小），并解析您的序列化器，然后调用 Deserialize 方法。如果您不认识标记，则（其中之一）：

跳过给定的字节数
抛出错误
将额外的字节存储在缓冲区中的某处（允许意外数据的往返）

作为上述内容的旁注- 如果系统是在运行时确定的，无论是通过反射还是通过运行时 DSL（等等），这种方法（策略）非常有用。如果系统在编译时完全是可预测的（因为它不会改变，或者因为您正在使用代码生成），那么直接switch方法可能会更有效适当的 - 并且您可能不需要任何额外的接口，因为您可以直接注入适当的代码。

This is a common problem in file parsing. Commonly, you read the known part of the descriptor (which luckily is fixed-length in this case, but isn't always), and branch it there. Generally I use a strategy pattern here, since I generally expect the system to be broadly flexible - but a straight switch or factory may work as well.

The other question is: do you control and trust the downstream code? Meaning: the factory / strategy implementation? If you do, then you can just give them the stream and the number of bytes you expect them to consume (perhaps putting some debug assertions in place, to verify that they do read exactly the right amount).

If you can't trust the factory/strategy implementation (perhaps you allow the user-code to use custom deserializers), then I would construct a wrapper on top of the stream (example: SubStream from protobuf-net), that only allows the expected number of bytes to be consumed (reporting EOF afterwards), and doesn't allow seek/etc operations outside of this block. I would also have runtime checks (even in release builds) that enough data has been consumed - but in this case I would probably just read past any unread data - i.e. if we expected the downstream code to consume 20 bytes, but it only read 12, then skip the next 8 and read our next descriptor.

To expand on that; one strategy design here might have something like:

interface ISerializer {
    object Deserialize(Stream source, int bytes);
    void Serialize(Stream destination, object value);
}

You might build a dictionary (or just a list if the number is small) of such serializers per expected markers, and resolve your serializer, then invoke the Deserialize method. If you don't recognise the marker, then (one of):

skip the given number of bytes
throw an error
store the extra bytes in a buffer somewhere (allowing for round-trip of unexpected data)

As a side-note to the above - this approach (strategy) is useful if the system is determined at runtime, either via reflection or via a runtime DSL (etc). If the system is entirely predictable at compile-time (because it doesn't change, or because you are using code-generation), then a straight switch approach may be more appropriate - and you probably don't need any extra interfaces, since you can inject the appropriate code directly.

回复收藏 0 原文

灰色世界里的红玫瑰 2024-08-15 05:46:40

要记住的一个关键事情是，如果您正在从流中读取并且没有检测到有效的标头/消息，请在重试之前仅丢弃第一个字节。我多次看到整个数据包或消息被丢弃，这可能导致有效数据丢失。

回复收藏 0 原文

三人与歌 2024-08-15 05:46:40

听起来这可能是工厂方法的工作，或者可能是抽象工厂。根据标头，您选择要调用的工厂方法，并返回相关类型的对象。

这是否比简单地将构造函数添加到 switch 语句更好取决于您所创建的对象的复杂性和一致性。

回复收藏 0 原文

小巷里的女流氓 2024-08-15 05:46:40

我建议：

fifo = Fifo.new

while(fd is readable) {
  read everything off the fd and stick it into fifo
  if (the front of the fifo is has a valid header and 
      the fifo is big enough for payload) {

      dispatch constructor, remove bytes from fifo
  }
}

使用这种方法：

您可以对错误的有效负载进行一些错误检查，并可能丢弃错误的数据
数据未在 fd 的读取缓冲区上等待（对于大型有效负载可能是一个问题）

I would suggest:

fifo = Fifo.new

while(fd is readable) {
  read everything off the fd and stick it into fifo
  if (the front of the fifo is has a valid header and 
      the fifo is big enough for payload) {

      dispatch constructor, remove bytes from fifo
  }
}

With this method:

you can do some error checking for bad payloads, and potentially throw bad data away
data is not waiting on the fd's read buffer (can be an issue for large payloads)

回复收藏 0 原文

浅唱々樱花落 2024-08-15 05:46:40

如果您希望它是良好的 OO，您可以在对象层次结构中使用访问者模式。我的做法是这样的（用于识别从网络捕获的数据包，几乎与您可能需要的东西相同）：

巨大的对象层次结构，有一个父类
每个类都有一个向其父类注册的静态构造函数，因此父类知道其直接子类（这是 C++，我认为在具有良好反射支持的语言中不需要此步骤）
每个类都有一个静态构造函数方法，该方法获取剩余的字节流的一部分，并基于此，它决定是否有责任处理该数据
当数据包传入时，我只是将其传递给主父类（称为 Packet）的静态构造函数方法，该方法又检查其所有子类是否有责任处理该数据包，这会递归进行，直到层次结构底部的一个类返回实例化的类。
每个静态“构造函数”方法从字节流中剪切自己的标头，并仅将有效负载传递给其子级。
每个静态“构造

这种方法的优点是，您可以在对象层次结构中的任何位置添加新类型，而无需查看/更改任何其他类。对于数据包来说，它的效果非常好。它是这样的：