使用 Boost Spirit 解析语法

发布于 2024-09-06 07:04:49 字数 2302 浏览 13 评论 0原文

我正在尝试解析类似树表达式的 C 函数，如下所示（使用 Spirit Parser Framework ）：

F( A() , B( GREAT( SOME , NOT ) ) , C( YES ) )

为此，我尝试在以下语法上使用三个规则：

template< typename Iterator , typename ExpressionAST >
struct InputGrammar : qi::grammar<Iterator, ExpressionAST(), space_type> {

    InputGrammar() : InputGrammar::base_type( ) {
       tag = ( qi::char_("a-zA-Z_")  >> *qi::char_("a-zA-Z_0-9") )[ push_back( at_c<0>(qi::_val) , qi::_1 ) ];
       command =  tag [ at_c<0>(qi::_val) = at_c<0>(qi::_1) ] >> "(" >> (*instruction >> ",")
                                        [ push_back( at_c<1>(qi::_val) , qi::_1 ) ]  >> ")";
       instruction = ( command | tag ) [qi::_val = qi::_1];
    }
    qi::rule< Iterator , ExpressionAST() , space_type > tag;
    qi::rule< Iterator , ExpressionAST() , space_type > command;
    qi::rule< Iterator , ExpressionAST() , space_type > instruction;
};

请注意，我的标记规则只是尝试捕获表达式中使用的标识符（“函数”名称）。另请注意，标签规则的签名返回 ExpressionAST 而不是 std::string，如大多数示例中所示。我想这样做的原因实际上很简单：我讨厌使用变体，如果可能的话我会避免使用它们。我想，如果能保留蛋糕并把它吃掉，那就太棒了。

命令应以标签（当前节点的名称、AST 节点的第一个字符串字段）和括号括起来的可变数量的参数开头，每个参数可以是标签本身或另一个命令。

然而，这个例子根本不起作用。它可以编译一切，但在运行时它无法解析我的所有测试字符串。真正让我烦恼的是我不知道如何修复它，因为我无法真正调试上面的代码，至少在这个词的传统意义上是这样。基本上，我认为修复上述代码的唯一方法就是知道我做错了什么。

所以，问题是我不知道上面的代码有什么问题。你如何定义上面的语法？

我使用的 ExpressionAST 类型是：

struct MockExpressionNode {
    std::string name;
    std::vector< MockExpressionNode > operands;

    typedef std::vector< MockExpressionNode >::iterator iterator;
    typedef std::vector< MockExpressionNode >::const_iterator const_iterator;

    iterator begin() { return operands.begin(); }
    const_iterator begin() const { return operands.begin(); }
    iterator end() { return operands.end(); }
    const_iterator end() const { return operands.end(); }

    bool is_leaf() const {
        return ( operands.begin() == operands.end() );
    }
};

BOOST_FUSION_ADAPT_STRUCT(
    MockExpressionNode,
    (std::string, name)
    (std::vector<MockExpressionNode>, operands)
)

原文

I am trying to parse a C-function like tree expressions like the following (using the Spirit Parser Framework):

F( A() , B( GREAT( SOME , NOT ) ) , C( YES ) )

For this I am trying to use the three rules on the following grammar:

template< typename Iterator , typename ExpressionAST >
struct InputGrammar : qi::grammar<Iterator, ExpressionAST(), space_type> {

    InputGrammar() : InputGrammar::base_type( ) {
       tag = ( qi::char_("a-zA-Z_")  >> *qi::char_("a-zA-Z_0-9") )[ push_back( at_c<0>(qi::_val) , qi::_1 ) ];
       command =  tag [ at_c<0>(qi::_val) = at_c<0>(qi::_1) ] >> "(" >> (*instruction >> ",")
                                        [ push_back( at_c<1>(qi::_val) , qi::_1 ) ]  >> ")";
       instruction = ( command | tag ) [qi::_val = qi::_1];
    }
    qi::rule< Iterator , ExpressionAST() , space_type > tag;
    qi::rule< Iterator , ExpressionAST() , space_type > command;
    qi::rule< Iterator , ExpressionAST() , space_type > instruction;
};

Notice that my tag rule just tries to capture the identifiers used in the expressions (the 'function' names). Also notice that the signature of the tag rule returns a ExpressionAST instead of a std::string, like in most examples. The reason I want to do it like this is actually pretty simple: I hate using variants and I will avoid them if possible. It would be great to keep the cake and eat it too I guess.

A command should start with a tag (the name of the current node, first string field of the AST node) and a variable number of arguments enclosed by parentheses, and each of the arguments can be a tag itself or another command.

However, this example does not work at all. It compiles and everything, but at run time it fails to parse all my test strings. And the thing that really annoys me is that I can't figure how to fix it, since I can't really debug the above code, at least in the traditional meaning of the word. Basically the only way I see I can fix the above code is by knowing what I am doing wrong.

So, the question is that I don't know what is wrong with the above code. How would you define the above grammar?

The ExpressionAST type I am using is:

struct MockExpressionNode {
    std::string name;
    std::vector< MockExpressionNode > operands;

    typedef std::vector< MockExpressionNode >::iterator iterator;
    typedef std::vector< MockExpressionNode >::const_iterator const_iterator;

    iterator begin() { return operands.begin(); }
    const_iterator begin() const { return operands.begin(); }
    iterator end() { return operands.end(); }
    const_iterator end() const { return operands.end(); }

    bool is_leaf() const {
        return ( operands.begin() == operands.end() );
    }
};

BOOST_FUSION_ADAPT_STRUCT(
    MockExpressionNode,
    (std::string, name)
    (std::vector<MockExpressionNode>, operands)
)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

倾其所爱 2024-09-13 07:04:49

就调试而言，可以使用正常的中断和监视方法。不过，由于规则的格式设置方式，这会变得很困难。如果您按照精神示例进行格式化（〜每行一个解析器，每行一个 phoenix 语句），断点将提供更多信息。

您的数据结构无法区分 A() 和 SOME，因为它们都是叶子（如果我遗漏了某些内容，请告诉我）。从您的变体评论来看，我认为这不是您的意图，因此为了区分这两种情况，我向 MockExpressionNode 添加了一个 bool commandFlag 成员变量（对于 A() 和 false 表示SOME），以及相应的融合适配器线。

具体来说，对于代码，您需要将启动规则传递给基本构造函数，即：

InputGrammar() : InputGrammar::base_type(instruction) {...}

这是语法中的入口点，也是您没有解析任何数据的原因。我很惊讶它没有它就编译了，我认为语法类型需要与第一条规则的类型匹配。即便如此，这仍然是一个方便遵循的约定。

对于tag规则，实际上有两个解析器qi::char_("a-zA-Z_")，即_1，类型为char 和 *qi::char_("a-zA-Z_0-9") ，它是 _2，类型为（基本上）vector。在没有自动规则的情况下不可能将它们强制转换为字符串，但是可以通过将规则附加到每个解析的字符来完成：

tag =   qi::char_("a-zA-Z_")
        [ at_c<0>(qi::_val) = qi::_1 ];
    >> *qi::char_("a-zA-Z_0-9")           //[] has precedence over *, so _1 is 
        [ at_c<0>(qi::_val) += qi::_1 ];  //  a char rather than a vector<char>

但是，让 Spirit 进行此转换会更干净。因此，定义一个新规则：

qi::rule< Iterator , std::string(void) , ascii::space_type > identifier;
identifier %= qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9");

并且不用担心它；）。然后标签变成

tag = identifier
      [
          at_c<0>(qi::_val) = qi::_1,
          ph::at_c<2>(qi::_val) = false //commandFlag
      ]

For command，第一部分很好，但是 (*instruction >>> ",")[ Push_back( at_c<1>(qi::_val) , qi::_1 有几个问题)]。这将解析零个或多个后跟“，”的指令规则。它还尝试push_back一个vector（也不知道为什么编译它，可能因为缺少启动规则而没有实例化？）。我认为您想要以下内容（带有标识符修改）：

command =
        identifier
        [
           ph::at_c<0>(qi::_val) = qi::_1, 
           ph::at_c<2>(qi::_val) = true    //commandFlag
        ]
    >>  "("
    >> -(instruction % ",")
        [
           ph::at_c<1>(qi::_val) = qi::_1
        ]
    >>  ")";

这使用可选运算符 - 和列表运算符 %，后者相当于 instruction > > *（“，”>>指令）。然后，phoenix 表达式将向量直接分配给结构成员，但您也可以将操作直接附加到指令匹配并使用 push_back。

指令规则很好，我只是提到它相当于instruction %= (command|tag)。

最后一件事，如果 A() 和 SOME 之间实际上没有区别（即您的原始结构没有 commandFlag），您可以编写该解析器仅使用自动规则：

template< typename Iterator , typename ExpressionAST >
struct InputGrammar : qi::grammar<Iterator, ExpressionAST(), ascii::space_type> {
   InputGrammar() : InputGrammar::base_type( command ) {
      identifier %=
             qi::char_("a-zA-Z_")
         >> *qi::char_("a-zA-Z_0-9");
      command %=
            identifier
         >> -(
            "("
         >> -(command % ",")
         >>  ")");
    }
    qi::rule< Iterator , std::string(void) , ascii::space_type > identifier;
    qi::rule< Iterator , ExpressionAST(void) , ascii::space_type > command;
};

这是使用对输入进行紧密建模的融合包装结构的一大好处。

As far as debugging, its possible to use a normal break and watch approach. This is made difficult by how you've formatted the rules though. If you format per the spirit examples (~one parser per line, one phoenix statement per line), break points will be much more informative.

Your data structure doesn't have a way to distinguish A() from SOME in that they are both leaves (let me know if I'm missing something). From your variant comment, I don't think this was your intention, so to distinguish these two cases, I added a bool commandFlag member variable to MockExpressionNode (true for A() and false for SOME), with a corresponding fusion adapter line.

For the code specifically, you need to pass the start rule to the base constructor, i.e.:

InputGrammar() : InputGrammar::base_type(instruction) {...}

This is the entry point in the grammar, and is why you were not getting any data parsed. I'm surprised it compiled without it, I thought that the grammar type was required to match the type of the first rule. Even so, this is a convenient convention to follow.

For the tag rule, there are actually two parsers qi::char_("a-zA-Z_"), which is _1 with type char and *qi::char_("a-zA-Z_0-9") which is _2 with type (basically) vector<char>. Its not possible to coerce these into a string without autorules, But it can be done by attaching a rule to each parsed char:

tag =   qi::char_("a-zA-Z_")
        [ at_c<0>(qi::_val) = qi::_1 ];
    >> *qi::char_("a-zA-Z_0-9")           //[] has precedence over *, so _1 is 
        [ at_c<0>(qi::_val) += qi::_1 ];  //  a char rather than a vector<char>

However, its much cleaner to let spirit do this conversion. So define a new rule:

qi::rule< Iterator , std::string(void) , ascii::space_type > identifier;
identifier %= qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9");

And don't worry about it ;). Then tag becomes

tag = identifier
      [
          at_c<0>(qi::_val) = qi::_1,
          ph::at_c<2>(qi::_val) = false //commandFlag
      ]

For command, the first part is fine, but theres a couple problems with (*instruction >> ",")[ push_back( at_c<1>(qi::_val) , qi::_1 ) ]. This will parse zero or multiple instruction rules followed by a ",". It also attempts to push_back a vector<MockExpressionNode> (not sure why this compiled either, maybe not instantiated because of the missing start rule?). I think you want the following (with the identifier modification):

command =
        identifier
        [
           ph::at_c<0>(qi::_val) = qi::_1, 
           ph::at_c<2>(qi::_val) = true    //commandFlag
        ]
    >>  "("
    >> -(instruction % ",")
        [
           ph::at_c<1>(qi::_val) = qi::_1
        ]
    >>  ")";

This uses the optional operator - and the list operator %, the latter is equivalent to instruction >> *("," >> instruction). The phoenix expression then just assigns the vector directly to the structure member, but you could also attach the action directly to the instruction match and use push_back.

The instruction rule is fine, I'll just mention that it is equivalent to instruction %= (command|tag).

One last thing, if there actually is no distinction between A() and SOME (i.e. your original structure with no commandFlag), you can write this parser using only autorules:

template< typename Iterator , typename ExpressionAST >
struct InputGrammar : qi::grammar<Iterator, ExpressionAST(), ascii::space_type> {
   InputGrammar() : InputGrammar::base_type( command ) {
      identifier %=
             qi::char_("a-zA-Z_")
         >> *qi::char_("a-zA-Z_0-9");
      command %=
            identifier
         >> -(
            "("
         >> -(command % ",")
         >>  ")");
    }
    qi::rule< Iterator , std::string(void) , ascii::space_type > identifier;
    qi::rule< Iterator , ExpressionAST(void) , ascii::space_type > command;
};

This is the big benefit of using a fusion wrapped structure that models the input closely.

回复收藏 0 原文

~没有更多了~