递归下降与递归上升解析

发布于 2024-12-14 23:43:00 字数 3059 浏览 2 评论 0原文

如果我正在编写自己的自定义解析器，我如何知道我是否正在编写递归上升解析器？我绝对对 LALR 解析的 O(n) 复杂度感兴趣（加上我已经有了 LALR 语法），并且不想稍后发现我已经编写了 LL 解析器。

编辑：我只见过自动表驱动的解析器和几个生成的简单示例递归解析器——它们看起来都不像我手工构建的任何东西。因此，很难将处理规则的“明显”代码与实际算法关联起来。

如果你把代码看成一个相对简单的规则，例如

name_or_qualified_name = identifier *('.' identifier);

我已经翻译成的，

std::vector<std::wstring> RecursiveParseNameOrQualifiedName(Iterator& begin, Iterator end) {
    std::vector<std::wstring> retval;
    retval.push_back(begin->Codepoints);
    CheckedIncrement(begin, end); // The only place that can legitimately expect end of input is namespace contents.
    while(begin->type == Wide::Lexer::TokenType::Dot) {
        CheckedIncrement(begin, end);
        if (begin->type != Wide::Lexer::TokenType::Identifier)
            Wide::ParserExceptionBuilder(*begin) << L"Expected 'identifier' after '.'";
        retval.push_back(begin->Codepoints);
    }
    return retval;
}

那么它就没有什么非常左或右的地方了。这显然是有用且重要的信息，但我没有看到。这里唯一明显的事实是它是递归的。

编辑：对不起，不好的例子。像这样的事情怎么样：

void RecursiveParseUsing(Iterator& begin, Iterator end, Wide::Parser::NamespaceAST* current_namespace) {
    auto new_using = std::unique_ptr<Wide::Parser::UsingAST>( new Wide::Parser::UsingAST() );
    // expect "begin" to point to a using
    CheckedIncrement(begin, end);
    // Must be an identifier, at least
    if (begin->type != Wide::Lexer::TokenType::Identifier)
        Wide::ParserExceptionBuilder(*begin) << L"Expected 'identifier' after 'using'";
    CheckedIncrement(begin, end);
    switch(begin->type) {
    case Wide::Lexer::TokenType::Dot: {
        begin--; // back to identifier
        new_using->original_name = RecursiveParseNameOrQualifiedName(begin, end);
        current_namespace->unnamed_contents.push_back(std::move(new_using));
        break; }
    case Wide::Lexer::TokenType::Equals: {
        begin--; // back to Identifier
        new_using->new_name = begin->Codepoints;
        begin++; // Back to equals
        begin++; // The token ahead of equals- should be "identifier"
        new_using->original_name = RecursiveParseNameOrQualifiedName(begin, end); // The only valid next production
        // this should be left at the one past the name
        current_namespace->contents[new_using->new_name] = std::move(new_using);
        break; }
    case Wide::Lexer::TokenType::Semicolon: {
        begin--; // Identifier
        new_using->original_name.push_back(begin->Codepoints);
        begin++; // Semicolon
        current_namespace->unnamed_contents.push_back(std::move(new_using));
        break; }
    default:
        Wide::ParserExceptionBuilder(*begin) << L"Expected '.', '=' or ';' after 'identifier' when parsing 'using'.";
    }
    if (begin->type != Wide::Lexer::TokenType::Semicolon)
        Wide::ParserExceptionBuilder(*begin) << L"Expected ';' after 'identifier'";
    CheckedIncrement(begin, end); // One-past-the-end
}

原文

If I'm writing my own custom parser, how can I know if I'm writing a recursive ascent parser? I'm definitely interested in the O(n) complexity of LALR parsing (plus I already have a LALR grammar) and don't want to find out later that I've written an LL parser instead.

Edit: I've only ever seen automatic table-driven parsers and a couple generated simple example recursive parsers- none of which look remotely like anything I'd construct by hand. So it's kind of hard to associate the "obvious" code to deal with a rule with an actual algorithm.

If you take the code for a relatively simple rule, for example

name_or_qualified_name = identifier *('.' identifier);

which I've translated into

std::vector<std::wstring> RecursiveParseNameOrQualifiedName(Iterator& begin, Iterator end) {
    std::vector<std::wstring> retval;
    retval.push_back(begin->Codepoints);
    CheckedIncrement(begin, end); // The only place that can legitimately expect end of input is namespace contents.
    while(begin->type == Wide::Lexer::TokenType::Dot) {
        CheckedIncrement(begin, end);
        if (begin->type != Wide::Lexer::TokenType::Identifier)
            Wide::ParserExceptionBuilder(*begin) << L"Expected 'identifier' after '.'";
        retval.push_back(begin->Codepoints);
    }
    return retval;
}

There's nothing very left or right about it. It's obviously useful and important information to know, and I'm not seeing it. The only obvious fact here is that it's recursive.

Edit: Sorry, bad example. How about something like this:

void RecursiveParseUsing(Iterator& begin, Iterator end, Wide::Parser::NamespaceAST* current_namespace) {
    auto new_using = std::unique_ptr<Wide::Parser::UsingAST>( new Wide::Parser::UsingAST() );
    // expect "begin" to point to a using
    CheckedIncrement(begin, end);
    // Must be an identifier, at least
    if (begin->type != Wide::Lexer::TokenType::Identifier)
        Wide::ParserExceptionBuilder(*begin) << L"Expected 'identifier' after 'using'";
    CheckedIncrement(begin, end);
    switch(begin->type) {
    case Wide::Lexer::TokenType::Dot: {
        begin--; // back to identifier
        new_using->original_name = RecursiveParseNameOrQualifiedName(begin, end);
        current_namespace->unnamed_contents.push_back(std::move(new_using));
        break; }
    case Wide::Lexer::TokenType::Equals: {
        begin--; // back to Identifier
        new_using->new_name = begin->Codepoints;
        begin++; // Back to equals
        begin++; // The token ahead of equals- should be "identifier"
        new_using->original_name = RecursiveParseNameOrQualifiedName(begin, end); // The only valid next production
        // this should be left at the one past the name
        current_namespace->contents[new_using->new_name] = std::move(new_using);
        break; }
    case Wide::Lexer::TokenType::Semicolon: {
        begin--; // Identifier
        new_using->original_name.push_back(begin->Codepoints);
        begin++; // Semicolon
        current_namespace->unnamed_contents.push_back(std::move(new_using));
        break; }
    default:
        Wide::ParserExceptionBuilder(*begin) << L"Expected '.', '=' or ';' after 'identifier' when parsing 'using'.";
    }
    if (begin->type != Wide::Lexer::TokenType::Semicolon)
        Wide::ParserExceptionBuilder(*begin) << L"Expected ';' after 'identifier'";
    CheckedIncrement(begin, end); // One-past-the-end
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尽揽少女心 2024-12-21 23:43:00

LL 和 LALR 都是 O(n)，所以没关系。

然而，并非所有递归下降解析器都是 LL 的。那些不使用某种形式的回溯的 - 尝试一种产生式，当它不起作用时尝试另一种产生式，直到用尽所有可能的产生式或找到成功的解析。注意到您正在这样做并不难:-)

至于如何知道您正在构造 LL 还是 LALR 解析器 - 您可以通过了解您正在遵循的构造方法来知道它。

编辑添加：递归下降和递归上升之间的一个显着特征是过程的作用。在递归下降中，每个非终结符都有一个过程。在递归上升中，每个 LR 状态都有一个过程。为了获得后者，您几乎必须事先构建 LR 自动机（除非您经常这样做，以至于可以即时完成 - 但在这种情况下您就不会问这个问题）。您的第一个代码示例看起来像递归下降；但您没有告诉我们第二个代码示例与您的语法有何关系，因此很难告诉我们。

回复收藏 0 原文

~没有更多了~