递归下降与递归上升解析

发布于 2024-12-14 23:43:00 字数 3059 浏览 2 评论 0原文

如果我正在编写自己的自定义解析器,我如何知道我是否正在编写递归上升解析器?我绝对对 LALR 解析的 O(n) 复杂度感兴趣(加上我已经有了 LALR 语法),并且不想稍后发现我已经编写了 LL 解析器。

编辑:我只见过自动表驱动的解析器和几个生成的简单示例递归解析器——它们看起来都不像我手工构建的任何东西。因此,很难将处理规则的“明显”代码与实际算法关联起来。

如果你把代码看成一个相对简单的规则,例如

name_or_qualified_name = identifier *('.' identifier);

我已经翻译成的,

std::vector<std::wstring> RecursiveParseNameOrQualifiedName(Iterator& begin, Iterator end) {
    std::vector<std::wstring> retval;
    retval.push_back(begin->Codepoints);
    CheckedIncrement(begin, end); // The only place that can legitimately expect end of input is namespace contents.
    while(begin->type == Wide::Lexer::TokenType::Dot) {
        CheckedIncrement(begin, end);
        if (begin->type != Wide::Lexer::TokenType::Identifier)
            Wide::ParserExceptionBuilder(*begin) << L"Expected 'identifier' after '.'";
        retval.push_back(begin->Codepoints);
    }
    return retval;
}

那么它就没有什么非常左或右的地方了。这显然是有用且重要的信息,但我没有看到。这里唯一明显的事实是它是递归的。

编辑:对不起,不好的例子。像这样的事情怎么样:

void RecursiveParseUsing(Iterator& begin, Iterator end, Wide::Parser::NamespaceAST* current_namespace) {
    auto new_using = std::unique_ptr<Wide::Parser::UsingAST>( new Wide::Parser::UsingAST() );
    // expect "begin" to point to a using
    CheckedIncrement(begin, end);
    // Must be an identifier, at least
    if (begin->type != Wide::Lexer::TokenType::Identifier)
        Wide::ParserExceptionBuilder(*begin) << L"Expected 'identifier' after 'using'";
    CheckedIncrement(begin, end);
    switch(begin->type) {
    case Wide::Lexer::TokenType::Dot: {
        begin--; // back to identifier
        new_using->original_name = RecursiveParseNameOrQualifiedName(begin, end);
        current_namespace->unnamed_contents.push_back(std::move(new_using));
        break; }
    case Wide::Lexer::TokenType::Equals: {
        begin--; // back to Identifier
        new_using->new_name = begin->Codepoints;
        begin++; // Back to equals
        begin++; // The token ahead of equals- should be "identifier"
        new_using->original_name = RecursiveParseNameOrQualifiedName(begin, end); // The only valid next production
        // this should be left at the one past the name
        current_namespace->contents[new_using->new_name] = std::move(new_using);
        break; }
    case Wide::Lexer::TokenType::Semicolon: {
        begin--; // Identifier
        new_using->original_name.push_back(begin->Codepoints);
        begin++; // Semicolon
        current_namespace->unnamed_contents.push_back(std::move(new_using));
        break; }
    default:
        Wide::ParserExceptionBuilder(*begin) << L"Expected '.', '=' or ';' after 'identifier' when parsing 'using'.";
    }
    if (begin->type != Wide::Lexer::TokenType::Semicolon)
        Wide::ParserExceptionBuilder(*begin) << L"Expected ';' after 'identifier'";
    CheckedIncrement(begin, end); // One-past-the-end
}

If I'm writing my own custom parser, how can I know if I'm writing a recursive ascent parser? I'm definitely interested in the O(n) complexity of LALR parsing (plus I already have a LALR grammar) and don't want to find out later that I've written an LL parser instead.

Edit: I've only ever seen automatic table-driven parsers and a couple generated simple example recursive parsers- none of which look remotely like anything I'd construct by hand. So it's kind of hard to associate the "obvious" code to deal with a rule with an actual algorithm.

If you take the code for a relatively simple rule, for example

name_or_qualified_name = identifier *('.' identifier);

which I've translated into

std::vector<std::wstring> RecursiveParseNameOrQualifiedName(Iterator& begin, Iterator end) {
    std::vector<std::wstring> retval;
    retval.push_back(begin->Codepoints);
    CheckedIncrement(begin, end); // The only place that can legitimately expect end of input is namespace contents.
    while(begin->type == Wide::Lexer::TokenType::Dot) {
        CheckedIncrement(begin, end);
        if (begin->type != Wide::Lexer::TokenType::Identifier)
            Wide::ParserExceptionBuilder(*begin) << L"Expected 'identifier' after '.'";
        retval.push_back(begin->Codepoints);
    }
    return retval;
}

There's nothing very left or right about it. It's obviously useful and important information to know, and I'm not seeing it. The only obvious fact here is that it's recursive.

Edit: Sorry, bad example. How about something like this:

void RecursiveParseUsing(Iterator& begin, Iterator end, Wide::Parser::NamespaceAST* current_namespace) {
    auto new_using = std::unique_ptr<Wide::Parser::UsingAST>( new Wide::Parser::UsingAST() );
    // expect "begin" to point to a using
    CheckedIncrement(begin, end);
    // Must be an identifier, at least
    if (begin->type != Wide::Lexer::TokenType::Identifier)
        Wide::ParserExceptionBuilder(*begin) << L"Expected 'identifier' after 'using'";
    CheckedIncrement(begin, end);
    switch(begin->type) {
    case Wide::Lexer::TokenType::Dot: {
        begin--; // back to identifier
        new_using->original_name = RecursiveParseNameOrQualifiedName(begin, end);
        current_namespace->unnamed_contents.push_back(std::move(new_using));
        break; }
    case Wide::Lexer::TokenType::Equals: {
        begin--; // back to Identifier
        new_using->new_name = begin->Codepoints;
        begin++; // Back to equals
        begin++; // The token ahead of equals- should be "identifier"
        new_using->original_name = RecursiveParseNameOrQualifiedName(begin, end); // The only valid next production
        // this should be left at the one past the name
        current_namespace->contents[new_using->new_name] = std::move(new_using);
        break; }
    case Wide::Lexer::TokenType::Semicolon: {
        begin--; // Identifier
        new_using->original_name.push_back(begin->Codepoints);
        begin++; // Semicolon
        current_namespace->unnamed_contents.push_back(std::move(new_using));
        break; }
    default:
        Wide::ParserExceptionBuilder(*begin) << L"Expected '.', '=' or ';' after 'identifier' when parsing 'using'.";
    }
    if (begin->type != Wide::Lexer::TokenType::Semicolon)
        Wide::ParserExceptionBuilder(*begin) << L"Expected ';' after 'identifier'";
    CheckedIncrement(begin, end); // One-past-the-end
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

尽揽少女心 2024-12-21 23:43:00

LL 和 LALR 都是 O(n),所以没关系。

然而,并非所有递归下降解析器都是 LL 的。那些不使用某种形式的回溯的 - 尝试一种产生式,当它不起作用时尝试另一种产生式,直到用尽所有可能的产生式或找到成功的解析。注意到您正在这样做并不难:-)

至于如何知道您正在构造 LL 还是 LALR 解析器 - 您可以通过了解您正在遵循的构造方法来知道它。

编辑添加:递归下降和递归上升之间的一个显着特征是过程的作用。在递归下降中,每个非终结符都有一个过程。在递归上升中,每个 LR 状态都有一个过程。为了获得后者,您几乎必须事先构建 LR 自动机(除非您经常这样做,以至于可以即时完成 - 但在这种情况下您就不会问这个问题)。您的第一个代码示例看起来像递归下降;但您没有告诉我们第二个代码示例与您的语法有何关系,因此很难告诉我们。

Both LL and LALR are O(n), so it doesn't matter.

However, not all recursive descent parsers are LL. Those that aren't use some form of backtracking ­– trying one production and when it doesn't work trying another until all possible productions have been exhausted or a successful parse is found. It's not very difficult to notice that you are doing this :-)

As to how you know whether you are constructing a LL or LALR parser – you know it by being aware of which construction method you are following.

Edited to add: One distinguishing feature between recursive descent and recursive ascent is the role of the procedures. In recursive descent, you have a procedure for each nonterminal. In recursive ascent, you have a procedure for each LR state. In order to have the latter, you pretty much have to construct the LR automaton beforehand (unless you've done this so often that you can do it on the fly – but in that case you wouldn't be asking this question). Your first code example looks like recursive descent; but you did not tell us how the second code sample relates to your grammar so it's difficult to tell there.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文