在几个问题中,我看到了对精神 来自 boost.org 的解析器生成器框架,但是在评论中,使用 Spirit 的人抱怨说不高兴。 请那些人站出来向我们其他人解释一下使用 Spirit 的缺点或缺点是什么?
In several questions I've seen recommendations for the Spirit parser-generator framework from boost.org, but then in the comments there is grumbling from people using Spirit who are not happy. Will those people please stand forth and explain to the rest of us what are the drawbacks or downsides to using Spirit?
发布评论
评论(5)
这是一个很酷的主意,我喜欢它; 真正学习如何使用 C++ 模板特别有用。
但他们的文档建议中小型解析器使用spirit。 完整语言的解析器需要很长时间才能编译。
我将列出三个原因。
无扫描器解析。 虽然它相当简单,但当需要回溯时,它可能会减慢解析器的速度。 不过它是可选的 - 可能会集成词法分析器,请参阅使用 Spirit 构建的 C 预处理器。 约 300 行的语法(包括 .h 和 .cpp 文件)使用 GCC 编译(未优化)为 6M 的文件。 内联和最大优化可将其降至约 1,7M。
解析速度慢 - 没有对语法进行静态检查,既不提示需要过多的前瞻,也不验证基本错误,例如左递归的使用(这会导致递归下降解析器中的无限递归LL语法) )。 不过,左递归并不是一个很难追踪的错误,但过多的前瞻可能会导致指数级解析时间。
大量使用模板 - 虽然这有一定的优点,但这会影响编译时间和代码大小。 此外,语法定义通常必须对所有其他用户可见,这会影响更多的编译时间。
我已经能够通过使用正确的参数添加显式模板实例化来将语法移动到 .cpp 文件,但这并不容易。
更新:我的回答仅限于我对 Spirit classic 的体验,而不是 Spirit V2。 我仍然期望 Spirit 很大程度上基于模板,但现在我只是猜测。
It is a quite cool idea, and I liked it; it was especially useful to really learn how to use C++ templates.
But their documentation recommends the usage of spirit for small to medium-size parsers. A parser for a full language would take ages to compile.
I will list three reasons.
Scannerless parsing. While it's quite simpler, when backtracking is required it may slow down the parser. It's optional though - a lexer might be integrated, see the C preprocessor built with Spirit. A grammar of ~300 lines (including both .h and .cpp files) compiles (unoptimized) to a file of 6M with GCC. Inlining and maximum optimizations gets that down to ~1,7M.
Slow parsing - there is no static checking of the grammar, neither to hint about excessive lookahead required, nor to verify basic errors, such as for instance usage of left recursion (which leads to infinite recursion in recursive-descent parsers LL grammars). Left recursion is not a really hard bug to track down, though, but excessive lookahead might cause exponential parsing times.
Heavy template usage - while this has certain advantages, this impacts compilation times and code size. Additionally, the grammar definition must normally be visible to all other users, impacting even more compilation times.
I've been able to move grammars to .cpp files by adding explicit template instantiations with the right parameters, but it was not easy.
UPDATE: my response is limited to my experience with Spirit classic, not Spirit V2. I would still expect Spirit to be heavily template-based, but now I'm just guessing.
在 boost 1.41 中,新版本的 Spirit 正在发布,它击败了spirit::classic:
In boost 1.41 a new version of Spirit is being released, and it beats of pants off of spirit::classic:
对我来说,最大的问题是 Spirit 中的表达式,如编译器或调试器所见,相当长(我在下面复制了 Spirit Classic 中一个表达式的一部分)。 这些表情让我害怕。 当我开发使用 Spirit 的程序时,我害怕使用 valgrind 或在 gdb 中打印回溯。
<代码>、boost::spirit::classic::ref_actor >, boost::spirit::classic::clear_action> >, boost::spirit::classic::ref_actor; >, boost::spirit::classic::clear_action> >, boost::spirit::classic::sequence,升压::精神::经典::chlit >,boost::spirit::classic::positive >, boost::spirit::classic::chlit; > > > >、boost::spirit::classic::ref_value_actor >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor >、std::string、boost::spirit::classic::push_back_action> > >、boost::spirit::classic::contigious、boost::spirit::classic::action< boost::spirit::classic::uint_parser, boost::spirit::classic::ref_value_actor >, boost::spirit::classic::push_back_action> > > > >、boost::spirit::classic::kleene_star、boost::spirit::classic::alternative<升压::精神::经典::替代<升压::精神::经典::行动<升压::精神::经典::连续<升压::精神::经典::序列<升压::精神:: classic::alternative、boost::spirit::classic::chlit >,boost::spirit::classic::positive >, boost::spirit::classic::chlit; > > > >、boost::spirit::classic::ref_value_actor >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor >、std::string、boost::spirit::classic::push_back_action> > >、boost::spirit::classic::contigious、boost::spirit::classic::action< boost::spirit::classic::uint_parser, boost::spirit::classic::ref_value_actor >, boost::spirit::classic::push_back_action> > > > > > > > >, void ()(char const, char const*)>, boost::spirit::classic::scanner, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> > >::类型 boost::spirit::classic::action
boost::spirit::classic::parser_result
For me, the biggest problem is that expressions in Spirit, as seen by compiler or debugger, are rather long (I copied below a part of one expression in Spirit Classic). These expressions scare me. When I work on a program that uses Spirit, I'm afraid to use valgrind or to print backtrace in gdb.
boost::spirit::classic::parser_result<boost::spirit::classic::action<boost::spirit::classic::sequence<boost::spirit::classic::action<boost::spirit::classic::action<optional_suffix_parser<char const*>, boost::spirit::classic::ref_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::clear_action> >, boost::spirit::classic::ref_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::clear_action> >, boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::action<boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::chlit<char>, boost::spirit::classic::chlit<char> >, boost::spirit::classic::positive<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alnum_parser, boost::spirit::classic::chlit<char> >, boost::spirit::classic::chlit<char> > > > >, boost::spirit::classic::ref_value_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action<boost::spirit::classic::rule<boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor<std::vector<std::string, std::allocator<std::string> >, std::string, boost::spirit::classic::push_back_action> > >, boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::action<boost::spirit::classic::uint_parser<unsigned int, 10, 1u, -1>, boost::spirit::classic::ref_value_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::push_back_action> > > > >, boost::spirit::classic::kleene_star<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::action<boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::chlit<char>, boost::spirit::classic::chlit<char> >, boost::spirit::classic::positive<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alnum_parser, boost::spirit::classic::chlit<char> >, boost::spirit::classic::chlit<char> > > > >, boost::spirit::classic::ref_value_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action<boost::spirit::classic::rule<boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor<std::vector<std::string, std::allocator<std::string> >, std::string, boost::spirit::classic::push_back_action> > >, boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::action<boost::spirit::classic::uint_parser<unsigned int, 10, 1u, -1>, boost::spirit::classic::ref_value_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::push_back_action> > > > > > > > >, void ()(char const, char const*)>, boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> > >::type boost::spirit::classic::action<boost::spirit::classic::sequence<boost::spirit::classic::action<boost::spirit::classic::action<
这是我不喜欢它的地方:
文档是有限的。 有一个大网页,其中解释了“所有内容”,但当前的解释缺乏细节。
AST 生成较差。 AST 的解释很差,即使你绞尽脑汁去了解 AST 修饰符是如何工作的,也很难获得一个易于操作的 AST(即能够很好地映射到问题域的)
它极大地增加了编译时间,即使对于“中等”大小的语法
语法太重量级了。 在 C/C++ 中,您必须重复代码(即在声明和定义之间),这是不争的事实。 然而,似乎在 boost::spirit 中,当你声明一个语法<>时,你必须重复一些事情 3 次 :D (当你想要 AST 时,这就是我想要的 :D)
除此之外,我认为它们考虑到 C++ 的局限性,解析器做得相当不错。 但我认为他们应该进一步改进。 历史页面描述了在当前“静态”精神之前有一个“动态”精神; 我想知道它的语法更快、更好。
Here is what I don't like about it:
the documentation is limited. There is one big web page where "everything" is explained, but the current explanations lack in details.
poor AST generation. ASTs are poorly explained and, even after hitting your head against the wall to understand how the AST modifiers work, it's difficult to obtain an easy to manipulate AST (i.e. one that maps well to the problem domain)
It increases compilation times enormously, even for "medium"-sized grammars
Syntax is too heavyweight. It is a fact of life that in C/C++ you must duplicate code (i.e. between declaration and definition). However, it seems that in boost::spirit, when you declare a grammar<>, you must repeat some things 3 times :D (when you want ASTs, which is what I want :D)
Other than this, I think they did a pretty good job with the parser, given the limitations of C++. But I think they should improve it more. The history page describes that there was a "dynamic" spirit before the current "static" spirit; I'm wondering how much faster and how much better syntax it had.
我想说最大的问题是缺乏对语法问题的任何诊断或其他帮助。 如果您的语法不明确,解析器可能无法解析您期望的内容,并且没有好的方法可以注意到这一点。
I would say the biggest problem is the lack of any diagnosis or other help for grammar problems. If your grammar is ambiguous, the parser might not parse what you expect it to, and there's no good way of noticing that.