解码字节流

发布于 2024-07-10 14:41:38 字数 593 浏览 9 评论 0原文

我有一系列由独立结构定义的消息。这些结构共享一个在应用程序之间发送的公共标头。我正在创建一个解码器，它将捕获使用这些结构构建的消息中的原始数据，并将它们解码/解析为一些纯文本。

我有超过 1000 条不同的消息需要解码，所以我不确定是否可以在 XML 中定义所有结构格式，然后使用 XSL 或某种翻译，或者是否有更好的方法来做到这一点。

有时我需要解码包含超过一百万条消息的日志，因此性能是一个问题。

对于创建解码器/解析器的技术/工具/算法有什么建议吗？

struct:
struct {
  dword messageid;
  dword datavalue1;
  dword datavalue2;
} struct1;

原始数据示例：

0101010A0A0A0A0F0F0F0F

解码消息（所需输出）：

message id: 0x01010101, datavalue1: 0x0A0A0A0A, datavalue2: 0x0F0F0F0F

我正在使用 C++ 进行此开发。

原文

I have a series of messages that are defined by independent structs. These structs share a common header are sent between applications. I am creating a decoder that will take the raw data captures in the messages that were built using these structs and decode/parse them to some plain text.

I have over 1000 different messages that need to be decoded so I am not sure if defining all the struct formats in XML and then using XSL or some translation is the way to go or if there is a better way to do this.

There are times when I will need to decode logs containing over a million messages so performance is a concern.

Any recommendations for techniques/tools/algorithms to go about creating the decoder/parser?

struct:
struct {
  dword messageid;
  dword datavalue1;
  dword datavalue2;
} struct1;

Example raw data:

0101010A0A0A0A0F0F0F0F

Decoded message (desired output):

message id: 0x01010101, datavalue1: 0x0A0A0A0A, datavalue2: 0x0F0F0F0F

I am using C++ to do this development.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

锦欢 2024-07-17 14:41:38

关于“性能” - 如果您使用磁盘 IO 和可能的显示 IO，我怀疑您的解析器/解码器会有多大效果，除非您使用真正可怕的算法。

我也不确定问题是什么 - 鉴于现在的问题 - 您在结构中有 3 个 DWORD，并且您声称有超过 1000 个基于这些值的唯一消息。

您的解码消息并不意味着您需要任何类型的解析 - 只是直接输出似乎可以工作（从字节转换为十六进制值的 ascii 表示）

如果您确实有从值到字符串的映射，那么一个大的switch 语句很简单 - 或者如果您希望能够动态添加这些或更改显示，那么我将在配置文件（文本、xml 等）中提供键/值对（映射），然后执行读取日志文件/原始数据时进行查找。

在这种情况下我会使用地图。

也许如果您提供值和解码输出的另一个具体示例，我可以提出更合适的建议。

回复收藏 0 原文

像你 2024-07-17 14:41:38

如果您在示例中使用的语法中已经给出了消息定义，那么您绝对不应该尝试将其手动转换为其他语法（XML 或其他语法）。

相反，您应该尝试编写一个编译器来接受这些方法定义，并将它们编译成解码器函数。

如今，建议使用 ANTLR 作为解析器生成器，使用任何 ANTLR 语言作为实际编译器（Java、Python、Ruby、C#、C++）。然后，该编译器应该输出 C 代码，该代码完成整个解码和漂亮打印。

回复收藏 0 原文

南汐寒笙箫 2024-07-17 14:41:38

您可以使用 yacc 或 antlr，添加适当的解析规则，在解析时从中填充一些数据结构（可能是树），然后遍历数据结构并做任何您喜欢的事情。

回复收藏 0 原文

呆萌少年 2024-07-17 14:41:38

我假设您需要做的就是格式化记录并输出它们。

使用自定义代码生成器。生成的代码看起来像这样：

typedef struct { word messageid; } Header;

//repeated for each record type
typedef struct {
    word messageid;
    // <members here>
} Record_##;
//END


void Process(Input inp, Output out) {
    char buffer[BIG_ENOUGH];
    char *offset;

    offset = &buffer[BIG_ENOUGH];

    while(notEnd) {
        if(&offset[sizeof(LargestStruct)] >= &buffer[BIG_ENOUGH])
            // move remaining buffer to start and fill tail from inp

        Header *hpt = (Header*)offset;

        switch(hpt->messageid)
        {
            //repeated for each record type
            case <recond ID for given type>: 
            {
                Record_##* rpt = (Record_##*)offset;
                outp.format("name1: %t, ...\n", rpt->name1, ...);
                offset += sizeof(Record_##);
                break;
            }
            //END
        }
    }
}

大部分都是样板代码，因此编写一个程序来生成它应该不难。

如果您需要更多处理，我认为可以对这个想法进行一些调整以使其正常工作。

编辑：重新阅读问题后，看起来您可能已经定义了结构。在这种情况下，您只需 #include 它们并直接使用它们。然而，最终会遇到如何解析结构以生成格式化函数的输入的问题。 awk 或 sed 可能会很方便。

I'm going to assume that all you need to do is format the records and output them.

Use a custom code generator. The generated code will look something like this:

typedef struct { word messageid; } Header;

//repeated for each record type
typedef struct {
    word messageid;
    // <members here>
} Record_##;
//END


void Process(Input inp, Output out) {
    char buffer[BIG_ENOUGH];
    char *offset;

    offset = &buffer[BIG_ENOUGH];

    while(notEnd) {
        if(&offset[sizeof(LargestStruct)] >= &buffer[BIG_ENOUGH])
            // move remaining buffer to start and fill tail from inp

        Header *hpt = (Header*)offset;

        switch(hpt->messageid)
        {
            //repeated for each record type
            case <recond ID for given type>: 
            {
                Record_##* rpt = (Record_##*)offset;
                outp.format("name1: %t, ...\n", rpt->name1, ...);
                offset += sizeof(Record_##);
                break;
            }
            //END
        }
    }
}

Most of that's boiler plate so writing a program to generate it shouldn't be to hard.

If you need more processing, I think this idea could be tweaked some to make that work as well.

Edit: after re-reading the question, it looks like you might have the structs defined already. In that cases you can just #include them and use them directly. However then you end up with the issue of how to parse the structs to generate the input to the formating function. Awk or sed might be handy there.

回复收藏 0 原文

~没有更多了~