boost.org 的 Spirit 解析器生成器框架有哪些缺点?

发布于 2024-07-12 13:57:20 字数 273 浏览 5 评论 0 原文

在几个问题中,我看到了对精神 来自 boost.org 的解析器生成器框架,但是在评论中,使用 Spirit 的人抱怨说不高兴。 请那些人站出来向我们其他人解释一下使用 Spirit 的缺点或缺点是什么?

In several questions I've seen recommendations for the Spirit parser-generator framework from boost.org, but then in the comments there is grumbling from people using Spirit who are not happy. Will those people please stand forth and explain to the rest of us what are the drawbacks or downsides to using Spirit?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

伴随着你 2024-07-19 13:57:20

这是一个很酷的主意,我喜欢它; 真正学习如何使用 C++ 模板特别有用。

但他们的文档建议中小型解析器使用spirit。 完整语言的解析器需要很长时间才能编译。
我将列出三个原因。

  • 无扫描器解析。 虽然它相当简单,但当需要回溯时,它可能会减慢解析器的速度。 不过它是可选的 - 可能会集成词法分析器,请参阅使用 Spirit 构建的 C 预处理器。 约 300 行的语法(包括 .h 和 .cpp 文件)使用 GCC 编译(未优化)为 6M 的文件。 内联和最大优化可将其降至约 1,7M。

  • 解析速度慢 - 没有对语法进行静态检查,既不提示需要过多的前瞻,也不验证基本错误,例如左递归的使用(这会导致递归下降解析器中的无限递归LL语法) )。 不过,左递归并不是一个很难追踪的错误,但过多的前瞻可能会导致指数级解析时间。

  • 大量使用模板 - 虽然这有一定的优点,但这会影响编译时间和代码大小。 此外,语法定义通常必须对所有其他用户可见,这会影响更多的编译时间。
    我已经能够通过使用正确的参数添加显式模板实例化来将语法移动到 .cpp 文件,但这并不容易。

更新:我的回答仅限于我对 Spirit classic 的体验,而不是 Spirit V2。 我仍然期望 Spirit 很大程度上基于模板,但现在我只是猜测。

It is a quite cool idea, and I liked it; it was especially useful to really learn how to use C++ templates.

But their documentation recommends the usage of spirit for small to medium-size parsers. A parser for a full language would take ages to compile.
I will list three reasons.

  • Scannerless parsing. While it's quite simpler, when backtracking is required it may slow down the parser. It's optional though - a lexer might be integrated, see the C preprocessor built with Spirit. A grammar of ~300 lines (including both .h and .cpp files) compiles (unoptimized) to a file of 6M with GCC. Inlining and maximum optimizations gets that down to ~1,7M.

  • Slow parsing - there is no static checking of the grammar, neither to hint about excessive lookahead required, nor to verify basic errors, such as for instance usage of left recursion (which leads to infinite recursion in recursive-descent parsers LL grammars). Left recursion is not a really hard bug to track down, though, but excessive lookahead might cause exponential parsing times.

  • Heavy template usage - while this has certain advantages, this impacts compilation times and code size. Additionally, the grammar definition must normally be visible to all other users, impacting even more compilation times.
    I've been able to move grammars to .cpp files by adding explicit template instantiations with the right parameters, but it was not easy.

UPDATE: my response is limited to my experience with Spirit classic, not Spirit V2. I would still expect Spirit to be heavily template-based, but now I'm just guessing.

在梵高的星空下 2024-07-19 13:57:20

在 boost 1.41 中,新版本的 Spirit 正在发布,它击败了spirit::classic:

经过很长一段时间的测试(超过 2
与 Spirit 2.0 合作的年数)、Spirit 2.1
最终将与发布
即将发布 Boost 1.41。 代码
现在非常稳定并且已经准备好
生产代码。 我们正在努力
关于及时完成文档
对于 Boost 1.41。 您可以查看
文档的当前状态
这里。 目前,您可以找到代码
以及 Boost SVN 中的文档
树干。 如果你有一个新项目
涉及精神,我们强烈推荐
现在从 Spirit 2.1 开始。 请允许我
引用 OvermindDL 的帖子
精神邮件列表:

<块引用>
<块引用>

我可能开始听起来像一个机器人
我经常这么说,但是
精神.经典是古老的,你应该
切换到Spirit2.1,就可以了
你所做的一切都非常重要
更容易,代码更少,而且
执行速度更快。 例如,
Spirit2.1 可以构建您的整个 AST
内联,没有奇怪的覆盖,不需要
之后建立东西,等等......,
一切都是美好而快速的一步。 你
确实需要更新。 参见其他
过去一天的帖子的链接
Spirit2.1 的文档等。 精神2.1
目前位于 Boost Trunk 中,但会
随Boost 1.41正式发布,
但在其他方面是完整的。


In boost 1.41 a new version of Spirit is being released, and it beats of pants off of spirit::classic:

After a long time in beta (more than 2
years with Spirit 2.0), Spirit 2.1
will finally be released with the
upcoming Boost 1.41 release. The code
is very stable now and is ready for
production code. We are working hard
on finishing the documentation in time
for Boost 1.41. You can peek at the
current state of the documentation
here. Currently, you can find the code
and documentation in the Boost SVN
trunk. If you have a new project
involving Spirit, we highly recommend
starting with Spirit 2.1 now. Allow me
to quote OvermindDL's post from the
Spirit mailing list:

I may start to sound like a bot with
how often I say this, but
Spirit.Classic is ancient, you should
switch to Spirit2.1, it can do
everything you did above a GREAT deal
easier, a lot less code, and it
executes faster. For example,
Spirit2.1 can build your entire AST
inline, no weird overriding, no need
to build things up afterwards, etc...,
all as one nice and fast step. You
really need to update. See the other
posts from the past day for links to
docs and such for Spirit2.1. Spirit2.1
is currently in Boost Trunk, but will
be formally released with Boost 1.41,
but is otherwise complete.

百善笑为先 2024-07-19 13:57:20

对我来说,最大的问题是 Spirit 中的表达式,如编译器或调试器所见,相当长(我在下面复制了 Spirit Classic 中一个表达式的一部分)。 这些表情让我害怕。 当我开发使用 Spirit 的程序时,我害怕使用 valgrind 或在 gdb 中打印回溯。

<代码>
boost::spirit::classic::parser_result、boost::spirit::classic::ref_actor >, boost::spirit::classic::clear_action> >, boost::spirit::classic::ref_actor; >, boost::spirit::classic::clear_action> >, boost::spirit::classic::sequence,升压::精神::经典::chlit >,boost::spirit::classic::positive >, boost::spirit::classic::chlit; > > > >、boost::spirit::classic::ref_value_actor >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor >、std::string、boost::spirit::classic::push_back_action> > >、boost::spirit::classic::contigious、boost::spirit::classic::action< boost::spirit::classic::uint_parser, boost::spirit::classic::ref_value_actor >, boost::spirit::classic::push_back_action> > > > >、boost::spirit::classic::kleene_star、boost::spirit::classic::alternative<升压::精神::经典::替代<升压::精神::经典::行动<升压::精神::经典::连续<升压::精神::经典::序列<升压::精神:: classic::alternative、boost::spirit::classic::chlit >,boost::spirit::classic::positive >, boost::spirit::classic::chlit; > > > >、boost::spirit::classic::ref_value_actor >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor >、std::string、boost::spirit::classic::push_back_action> > >、boost::spirit::classic::contigious、boost::spirit::classic::action< boost::spirit::classic::uint_parser, boost::spirit::classic::ref_value_actor >, boost::spirit::classic::push_back_action> > > > > > > > >, void ()(char const, char const*)>, boost::spirit::classic::scanner, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> > >::类型 boost::spirit::classic::action

For me, the biggest problem is that expressions in Spirit, as seen by compiler or debugger, are rather long (I copied below a part of one expression in Spirit Classic). These expressions scare me. When I work on a program that uses Spirit, I'm afraid to use valgrind or to print backtrace in gdb.


boost::spirit::classic::parser_result<boost::spirit::classic::action<boost::spirit::classic::sequence<boost::spirit::classic::action<boost::spirit::classic::action<optional_suffix_parser<char const*>, boost::spirit::classic::ref_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::clear_action> >, boost::spirit::classic::ref_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::clear_action> >, boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::action<boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::chlit<char>, boost::spirit::classic::chlit<char> >, boost::spirit::classic::positive<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alnum_parser, boost::spirit::classic::chlit<char> >, boost::spirit::classic::chlit<char> > > > >, boost::spirit::classic::ref_value_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action<boost::spirit::classic::rule<boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor<std::vector<std::string, std::allocator<std::string> >, std::string, boost::spirit::classic::push_back_action> > >, boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::action<boost::spirit::classic::uint_parser<unsigned int, 10, 1u, -1>, boost::spirit::classic::ref_value_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::push_back_action> > > > >, boost::spirit::classic::kleene_star<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::action<boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::chlit<char>, boost::spirit::classic::chlit<char> >, boost::spirit::classic::positive<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alnum_parser, boost::spirit::classic::chlit<char> >, boost::spirit::classic::chlit<char> > > > >, boost::spirit::classic::ref_value_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action<boost::spirit::classic::rule<boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor<std::vector<std::string, std::allocator<std::string> >, std::string, boost::spirit::classic::push_back_action> > >, boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::action<boost::spirit::classic::uint_parser<unsigned int, 10, 1u, -1>, boost::spirit::classic::ref_value_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::push_back_action> > > > > > > > >, void ()(char const, char const*)>, boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> > >::type boost::spirit::classic::action<boost::spirit::classic::sequence<boost::spirit::classic::action<boost::spirit::classic::action<

私野 2024-07-19 13:57:20

这是我不喜欢它的地方:

  • 文档是有限的。 有一个大网页,其中解释了“所有内容”,但当前的解释缺乏细节。

  • AST 生成较差。 AST 的解释很差,即使你绞尽脑汁去了解 AST 修饰符是如何工作的,也很难获得一个易于操作的 AST(即能够很好地映射到问题域的)

  • 它极大地增加了编译时间,即使对于“中等”大小的语法

  • 语法太重量级了。 在 C/C++ 中,您必须重复代码(即在声明和定义之间),这是不争的事实。 然而,似乎在 boost::spirit 中,当你声明一个语法<>时,你必须重复一些事情 3 次 :D (当你想要 AST 时,这就是我想要的 :D)

除此之外,我认为它们考虑到 C++ 的局限性,解析器做得相当不错。 但我认为他们应该进一步改进。 历史页面描述了在当前“静态”精神之前有一个“动态”精神; 我想知道它的语法更快、更好。

Here is what I don't like about it:

  • the documentation is limited. There is one big web page where "everything" is explained, but the current explanations lack in details.

  • poor AST generation. ASTs are poorly explained and, even after hitting your head against the wall to understand how the AST modifiers work, it's difficult to obtain an easy to manipulate AST (i.e. one that maps well to the problem domain)

  • It increases compilation times enormously, even for "medium"-sized grammars

  • Syntax is too heavyweight. It is a fact of life that in C/C++ you must duplicate code (i.e. between declaration and definition). However, it seems that in boost::spirit, when you declare a grammar<>, you must repeat some things 3 times :D (when you want ASTs, which is what I want :D)

Other than this, I think they did a pretty good job with the parser, given the limitations of C++. But I think they should improve it more. The history page describes that there was a "dynamic" spirit before the current "static" spirit; I'm wondering how much faster and how much better syntax it had.

予囚 2024-07-19 13:57:20

我想说最大的问题是缺乏对语法问题的任何诊断或其他帮助。 如果您的语法不明确,解析器可能无法解析您期望的内容,并且没有好的方法可以注意到这一点。

I would say the biggest problem is the lack of any diagnosis or other help for grammar problems. If your grammar is ambiguous, the parser might not parse what you expect it to, and there's no good way of noticing that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文