用于构建结构化二进制数据解析器的框架?
我在实用程序员类型代码生成方面有一些经验:以平台中立的格式指定数据结构,并为代码生成器编写模板,该代码生成器使用这些数据结构文件并生成将原始字节拉入特定于语言的数据结构的代码,对数字数据进行缩放,打印出数据等。好的实用(TM)想法是(a)我可以通过修改我的规范文件并重新生成源(DRY 等)来更改数据结构和( b) 我可以添加额外的函数,只需修改我的模板即可为我的所有结构生成这些函数。
我使用的是一个名为 Jeeves 的 Perl 脚本有效,但它是通用目的,我想编写的任何函数来操作我从头开始编写的数据。
是否有任何框架非常适合为结构化二进制数据创建解析器?我对 Antlr 的了解表明,这有点过分了。我当前感兴趣的目标语言是 C#、C++ 和 Java(如果有的话)。
一如既往地感谢。
编辑:我将对这个问题悬赏。如果有任何我应该关注的领域(要搜索的关键字)或您自己开发的解决此问题的其他方法,我很想听听。
I have some experience with Pragmatic-Programmer-type code generation: specifying a data structure in a platform-neutral format and writing templates for a code generator that consume these data structure files and produce code that pulls raw bytes into language-specific data structures, does scaling on the numeric data, prints out the data, etc. The nice pragmatic(TM) ideas are that (a) I can change data structures by modifying my specification file and regenerating the source (which is DRY and all that) and (b) I can add additional functions that can be generated for all of my structures just by modifying my templates.
What I had used was a Perl script called Jeeves which worked, but it's general purpose, and any functions I wanted to write to manipulate my data I was writing from the ground up.
Are there any frameworks that are well-suited for creating parsers for structured binary data? What I've read of Antlr suggests that that's overkill. My current target langauges of interest are C#, C++, and Java, if it matters.
Thanks as always.
Edit: I'll put a bounty on this question. If there are any areas that I should be looking it (keywords to search on) or other ways of attacking this problem that you've developed yourself, I'd love to hear about them.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您还可以考虑一个相对较新的项目 Kaitai Struct,它为此目的提供了一种语言,并且还有一个很好的 IDE:
Kaitai.io
Also you may look to a relatively new project Kaitai Struct, which provides a language for that purpose and also has a good IDE:
Kaitai.io
您可能会发现 ASN.1 很有趣,因为它提供了一种抽象的方式来描述您可能需要的数据加工。如果您使用 ASN.1 来抽象地描述数据,则需要一种方法将该抽象数据映射到具体的二进制流,为此 ECN(编码控制符号) 可能是正确的选择。
New Jersey Machine Toolkit实际上专注于指令集对应的二进制数据流,但我认为这是二进制流的超集。它具有非常好的设施,可以根据位字符串定义字段,并自动生成此类的访问器和生成器。这可能特别有用
如果您的二进制数据结构包含指向数据流其他部分的指针。
You might find ASN.1 interesting, as it provide an absract way to describe the data you might be processing. If you use ASN.1 to describe the data abstractly, you need a way to map that abstract data to concrete binary streams, for which ECN (Encoding Control Notation) is likely the right choice.
The New Jersey Machine Toolkit is actually focused on binary data streams corresponding to instruction sets, but I think that's a superset of just binary streams. It has very nice facilities for defining fields in terms of bit strings, and automatically generating accessors and generators of such. This might be particularly useful
if your binary data structures contain pointers to other parts of the data stream.