我最近将源文件解析添加到现有工具中,该工具从复杂的命令行参数生成输出文件。
命令行参数变得如此复杂,以至于我们开始允许它们作为一个文件提供,该文件被解析为一个非常大的命令行,但语法仍然很尴尬。因此我添加了使用更合理的语法解析源文件的功能。
我使用适用于 Windows 的 Flex 2.5.4 来生成此自定义源文件格式的标记器,并且它有效。但我讨厌这些代码。全局变量、奇怪的命名约定以及它生成的 C++ 代码非常糟糕。现有的代码生成后端粘在 flex 的输出上 - 我不使用 yacc 或 bison。
我即将深入研究该代码,并且我想使用更好/更现代的工具。有谁知道那件事吗。
- 在 Windows 命令提示符下运行(Visual Studio 集成没问题,但我使用 make 文件来构建)
- 生成正确的封装 C++ 标记生成器。 (无全局变量)
- 使用正则表达式来描述标记规则(与 lex 语法兼容)
- 不会强迫我使用 c 运行时(或伪造它)来读取文件。 (从内存中解析)
- 当我的规则强制标记生成器回溯(或自动修复它)时警告我
- 让我完全控制变量和方法名称(这样我就可以符合我现有的命名约定)
- 允许我将多个解析器链接到一个解析器中.exe 没有名称冲突
- 如果我想要它可以生成 UNICODE(16 位 UCS-2)解析器
- 不是集成的分词器 + 解析器生成器(我想要 lex 替换,而不是 lex+yacc 替换)
我可能可以使用刚刚生成标记化表的工具(如果这是唯一可用的)。
I recent added source file parsing to an existing tool that generated output files from complex command line arguments.
The command line arguments got to be so complex that we started allowing them to be supplied as a file that was parsed as if it was a very large command line, but the syntax was still awkward. So I added the ability to parse a source file using a more reasonable syntax.
I used flex 2.5.4 for windows to generate the tokenizer for this custom source file format, and it worked. But I hated the code. global variables, wierd naming convention, and the c++ code it generated was awful. The existing code generation backend was glued to the output of flex - I don't use yacc or bison.
I'm about to dive back into that code, and I'd like to use a better/more modern tool. Does anyone know of something that.
- Runs in Windows command prompt (Visual studio integration is ok, but I use make files to build)
- Generates a proper encapsulated C++ tokenizer. (No global variables)
- Uses regular expressions for describing the tokenizing rules (compatible with lex syntax a plus)
- Does not force me to use the c-runtime (or fake it) for file reading. (parse from memory)
- Warns me when my rules force the tokenizer to backtrack (or fixes it automatically)
- Gives me full control over variable and method names (so I can conform to my existing naming convention)
- Allows me to link multiple parsers into a single .exe without name collisions
- Can generate a UNICODE (16bit UCS-2) parser if I want it to
- Is NOT an integrated tokenizer + parser-generator (I want a lex replacement, not a lex+yacc replacement)
I could probably live with a tool that just generated the tokenizing tables if that was the only thing available.
发布评论
评论(5)
Ragel:http://www.complang.org/ragel/ 它符合您的大部分要求。
它生成的代码对程序的干扰很小。该代码的速度也非常快,而且 Ragel 语法比我见过的任何语法都更加灵活和可读。这是一款坚如磐石的软件。它可以生成表驱动的解析器或 goto 驱动的解析器。
Ragel: http://www.complang.org/ragel/ It fits most of your requirements.
The code it generates interferes very little with a program. The code is also incredibly fast, and the Ragel syntax is more flexible and readable than anything I've ever seen. It's a rock solid piece of software. It can generate a table-driven parser or a goto-driven parser.
Flex 还具有 C++ 输出选项。
结果是一组执行该解析的类。
只需将以下内容添加到 lex 文件的头部:
然后在源代码中它是:
Flex also has a C++ output option.
The result is a set of classes that do that parsing.
Just add the following to the head of you lex file:
Then in you source it is:
Boost.Spirit.Qi(解析器-标记器)或Boost.Spirit.Lex(仅限标记器)。我非常喜欢 Qi,Lex 也不错,但我只是倾向于使用 Qi 来满足我的解析需求...
Qi 唯一真正的缺点往往是编译时间的增加,而且它的运行速度也比手工慢一些- 编写解析代码。不过,它通常比使用正则表达式解析快得多。
http://www.boost.org/doc /libs/1_41_0/libs/spirit/doc/html/index.html
Boost.Spirit.Qi (parser-tokenizer) or Boost.Spirit.Lex (tokenizer only). I absolutely love Qi, and Lex is not bad either, but I just tend to take Qi for my parsing needs...
The only real drawback with Qi tends to be an increase in compile time, and it is also runs slightly slower than hand-written parsing code. It is generally much faster than parsing with regex, though.
http://www.boost.org/doc/libs/1_41_0/libs/spirit/doc/html/index.html
我想到了两个工具,尽管您需要自己找出合适的工具,Antlr 和GoldParser。这两个工具都有可用的语言绑定,可以将其插入到 C++ 运行时环境中。
There's two tools that comes to mind, although you would need to find out for yourself which would be suitable, Antlr and GoldParser. There are language bindings available in both tools in which it can be plugged into the C++ runtime environment.
boost.spirit 和 Yard 解析器浮现在我的脑海中。请注意,使用词法分析器生成器的方法在某种程度上被 C++ 内部 DSL(特定于域的语言)所取代来指定标记。很简单,因为它是您代码的一部分,无需使用外部实用程序,只需遵循一系列规则来指定您的语法。
boost.spirit and Yard parser come to my mind. Note that the approach of having lexer generators is somewhat substituted by C++ inner DSL (domain-specific language) to specify tokens. Simply because it is part of your code without using an external utility, just by following a series of rules to specify your grammar.