用于解析 C++ 中的许多小文本的最佳解析器生成器?
出于性能原因,我将 C# 库移植到 C++。在正常操作期间,该库除其他事项外,还需要解析大约 150,000 个平均长度小于 150 个字符的数学表达式(例如 Excel 公式)。
在C#版本中,我使用GOLD解析器来生成解析代码。它可以在一秒内解析所有 150,000 个表达式。
因为我们正在考虑扩展我们的语言,所以我认为转向 C++ 可能是转向 ANTLR 的好机会。我已将(简单)语法移植到 ANTLR 并从中生成了 C 代码。解析 150'000 个表达式需要超过 12 秒,因为对于每个表达式,我需要创建一个新的 ANTL3_INPUT_STREAM、令牌流、词法分析器和解析器 - 至少在版本 3.4 中,无法重用它们。
如果有人能给我一个建议,我将不胜感激 - GOLD 当然是一个选项,尽管生成 C++ 或 C 代码似乎比 C# 代码复杂得多。我的语法与 LALR 和 LL(1) 兼容。最重要的是解析小输入的性能。
I am, for performance reason, porting a C# library to C++. During normal operation, this library needs, amongst other things, to parse about 150'000 math expressions (think excel formulas) with an average length of less than 150 characters.
In the C# version, I used GOLD parser to generate parsing code. It can parse all 150'000 expressions in under one second.
Because we were thinking about extending our language, I figured the move to C++ might be a good chance to change to ANTLR. I have ported the (simple) grammar to ANTLR and generated C code out of it. Parsing the 150'000 expressions takes over 12 seconds, because for each expression, I need to create a new ANTL3_INPUT_STREAM, token stream, lexer and parser - there is, at least in version 3.4, no way to reuse them.
I'd be grateful is someone could give me a recommendation what to use instead - GOLD is of course an option though generating C++ or C code seems a lot more complicated than the C# variety. My grammar is LALR and LL(1) compatible. Paramount concern is parsing performance on small inputs.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我会尝试提升::精神。它通常非常快(即使是解析整数等简单的东西,它也比 C 函数 atoi http://alexott.blogspot.com/2010/01/boostspirit2-vs-atoi.html)
http://boost-spirit.com/home/
它有很好的东西:只有标题,所以依赖地狱,自由许可证。
但请注意,学习曲线很困难。它是现代 C++(没有指针,但有很多模板和非常令人沮丧的编译错误),因此来自 C 或 C#,您可能会不太舒服。
I would try boost::spirit. It is often extreamly fast (even for parsing simple things like an integer it can be faster than the C function atoi http://alexott.blogspot.com/2010/01/boostspirit2-vs-atoi.html)
http://boost-spirit.com/home/
It has nice things : header only, so dependency hell, liberal licence.
However be warned that the learning curve is difficult. It's modern C++ (no pointer, but a lot of template and very frustrating compiling errors), so coming from C or C#, you might not be very comfortable.
如果要解析的语法很简单,您可能只需手动编写解析器即可。
大多数解析器生成器的设计目的是让您可以轻松地启动一个可用的解析器,而执行时间通常会因此受到影响。
If the grammar to be parsed is simple, you might just write the parser by hand.
Most parser generators are designed to make it easy to whip up a working parser, and execution time often suffers as a result.
我在解析中看到的最佳性能来自 Boost.Spirit.Qi,它使用元模板编程在 C++ 中表达语法。但这不适合胆小的人。
这需要很好地隔离,并且包含解析器的文件的编译时间将增加到几秒钟(因此最好确保那里的时间尽可能少)。
The best performance I have seen in parsing came from Boost.Spirit.Qi which expresses the grammar in C++ using meta-template programming. It is not for the faint of heart though.
This will need be well insulated, and the compilation time of the file containing the parser will increase to several seconds (so better make sure there is as few as possible there).
如果表达式的语法足够简单,请考虑制作一个手写的递归下降解析器。它可以运行得非常快,并且使您能够(足够小心)报告良好且具体的语法错误。
您也可以使用 bison,但我相信手写的递归解析器可能会更快。
您可以使用 flex 生成的词法分析器进行词法分析,并以递归方式手动进行解析下降方式。
供您参考,GCC 编译器有自己的 C++ 和 C++ 递归下降解析器。至少C。它不再使用解析器生成器(例如 bison 或 ANTLR)。
If the syntax of your expression is simple enough, consider making a hand-written recursive descent parser. It can run really fast, and give you the ability (with enough care) to report nicely and specifically syntax errors.
You could also use bison, but I believe a hand-written recursive parser would probably go faster.
And you can do lexing with a flex generated lexer, and do the parsing manually, in a recursive descent way.
For your information the GCC compiler has its own recursive-descent parsers for C++ & C at least. It does not use parser generators (like bison or ANTLR) anymore.
而不是
expr
让您在语法上识别sequence-of-expr
。编辑:
而不是(野牛语法):
有:
Instead of
expr
make you grammar recognizesequence-of-expr
.EDIT:
Instead of having (bison syntax):
have:
我已经编写了很多解析器,并且手工编码的递归下降是我的做法。它们很容易编写并且几乎是最佳的。
也就是说,如果您追求的是速度,那么无论您写什么,都有足够的空间来加快速度。
这些方式可能会让你感到惊讶,因为任何你能想到的事情,你都已经做过了。
这是幻灯片集展示了如何操作它。
I've written many parsers, and hand-coded recursive-descent is the way I do it. They are easy to write and pretty much optimal.
That said, if speed is what you're after, no matter what you write there will be plenty of room to speed it up.
These will be in ways that could surprise you, because anything you could think of, you would have already done.
Here's a slide set showing how to do it.