如何编写自己的Configformat
我为应用程序的配置文件开发了一种自己的文件格式(纯文本和基于行的 -> EOL = 一个配置)。 这种格式没什么特别的,我这样做的唯一原因是为了学习一些东西! 读取器和写入器函数将用 C 实现(使用 GLib,因为它应该是 UTF8 编码的文件)。
所以现在,我正在考虑如何用 C 代码实现这种格式。 我必须执行哪些步骤才能获得尽可能好的错误消息。 我听说过一些关于 Lexer、Parser 的事情,但从未深入了解过。 我对它们只有一个非常抽象的想法。 那么我需要执行哪些步骤才能获得一个用 C 编写的格式的干净阅读器,并且该格式也可以在将来进行更改时进行维护? 需要学习/思考的主题是什么?
是的,我知道:C 很痛苦,这个建议有很多不同的“性感”格式等等。 我想学点东西!
干杯, Gregor
附加信息
- 读取器/写入器/解析器(或任何名称)应尽可能少地依赖第三方程序/组件。 围绕此配置部分的应用程序已经使用了 GLib,因此 GLib 也用于 UTF8
I've developed an own file format for configuration files (plaintext and line based -> EOL = one configuration) for an application. This format is nothing quit special and the only reason I do this, is to learn something! The reader and writer functions will be implemented in C (with GLib because it should be a UTF8 encoded file).
So now, I'm thinking about the way I implement this format in C code. Which steps I have to do to get error messages that are as good as possible. I've heard something about Lexer, Parser, ... but never gone too deep in it. I’ve only a very abstract idea of them. So which steps I need to do to get a clean reader written in C for the format, which is also maintainable for future changes? What are the topics to learn/think about?
And yes I know: C is pain, there are a lot of diffrent "sexy" formats for this propose and so on. I want to learn something!
Cheers,
Gregor
Additional information
- The reader/writer/parser (or whatever it's called) should depend on as little as possible on third party programs/components. The application around this config part already uses GLib, so that's whay GLib is also used for UTF8
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
创建配置格式的一种很酷的方法是嵌入脚本语言。
这为您提供了免费的解析器,并让您可以动态生成数据或定义正在重用的变量:
考虑 xml 与丑陋的伪脚本语言的这些示例:
vs:
或可能
vs
第一个示例压缩了以下列表:指向一些语句,第二个示例展示了如何使用临时变量删除大量冗余数据。
这种情况下流行的语言是 Lua。 具体如何将脚本语言映射到配置取决于集成商,但它确实非常强大,并且免费提供解析和类型检查。
One cool way of creating a config format is to embed a scripting language.
This gives you the parser for free and gives you the possibility to generate data on the fly or define variables that are being reused:
Consider these examples of xml vs an ugly pseudo scripting language:
vs:
or perhaps
vs
The first example compresses a list of points to a few statements, the second examples shows how to remove lots of redundant data using a temporary variable.
A popular language for this kind of situation is Lua. Exactly how to map a scripting language to configuration is up to the integrator, but it's really powerful and it comes with parsing and type checking for free.
您可能想查看 libconfig 源代码。 它有一个轻量级解析器,您可以将其用作起点,这可能会帮助您弄清楚适合您自己的格式的解析器应该是什么样子。
不过,如果您真的想了解解析器和词法分析器,那么实现一个简单的编译器可能会更好。 有一个麻省理工学院课程 你可以关注。
You might want to look at the libconfig source code. It has a lightweight parser you could use as a starting point and that will probably help you in figuring out what a parser for your own format would have to look like.
Though, if you really want to learn about parsers and lexers, it would probably be better to implement a simple compiler. There's an MIT course you could follow.
根据您想深入学习该问题的深度,您应该考虑不要手动编写解析器。 当然,您可以这样做,但这会更加复杂,并且向您的语言添加新功能将使您面临始终适应词法分析器和解析器代码的问题。
好处是,有很多工具可以让您从输入及其结构的高级描述中生成这些内容。 执行此操作的标准 *nix 工具是 Lex 和 Yacc(或其后代 Flex 和 Bison),但我想向您推荐 ANTLR (http://www.antlr.org) 代替。 它的一大优点是它为许多不同的语言(C/C++ 以及 Java、Python、Ruby、C# 等)提供了后端,因此如果您想切换,学习如何使用它也会对您有所帮助。语言稍后再说。
Depending on how deep you'd like to dive into learning the matter, you should think about not writing your parser manually. You can do so of course, but it will be a great deal more complicated and adding new features to your language will burden you with the problems of always adapting lexer and parser code.
The good thing is, there are lots of tools out there that enable you to generate this stuff from a high-level description of your input and its structure. Standard *nix tools to do so are Lex and Yacc (or their descendants Flex and Bison), but I'd like to point you to ANTLR (http://www.antlr.org) instead. One of its nice features is that it provides backends for many different languages (C/C++ as well as Java, Python, Ruby, C#, ...), so learning how to work with it will also help you if you want to switch languages at a later point.