当前位置：文江博客话题详情

语法：自上而下和自下而上的区别？

发布于 2024-09-07 22:36:48 字数 33 浏览 12 评论 0原文

自上而下和自下而上语法有什么区别？举个例子就太好了。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

多情出卖 2024-09-14 22:36:48

首先，语法本身不是自上而下或自下而上的，解析器是（尽管有些语法可以被其中一个解析，但不能被另一个解析）。

从实践的角度来看，主要区别在于大多数手写解析器是自上而下的，而更大比例的机器生成的解析器是自下而上的（当然，相反也是可能的）。

自上而下的解析器通常使用递归下降，这通常意味着类似这样的结构（使用典型的数学表达式作为示例）：

expression() { term() [-+] expression }
term() { factor() [*/] term() }
factor() { operand() | '(' expression() ')' }

自下而上的解析器以相反的方向工作 - 递归下降解析器从完整表达式开始，并将其分解为越来越小的部分，直到达到单个标记的级别，自下而上的解析器从单个标记开始，并使用有关这些标记如何组合成越来越高的表达式层次结构的规则表直到它到达顶层（上面表示为“表达式”）。

编辑：为了澄清，也许添加一个非常简单的解析器是有意义的。在这种情况下，我将执行旧的经典操作，将典型数学表达式的简化版本从中缀转换为后缀：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void expression(void);

void show(int ch) { 
    putchar(ch);
    putchar(' ');
}

int token() { 
    int ch;
    while (isspace(ch=getchar()))
        ;
    return ch;
}

void factor() { 
    int ch = token();
    if (ch == '(') {
        expression();
        ch = token();
        if (ch != ')') {
            fprintf(stderr, "Syntax error. Expected close paren, found: %c\n", ch);
            exit(EXIT_FAILURE);
        }
    }
    else
        show(ch);
}

void term() { 
    int ch;
    factor();
    ch = token();
    if (ch == '*' || ch == '/') {
        term();
        show(ch);
    }
    else
        ungetc(ch, stdin);
}

void expression() {
    int ch;
    term();
    ch = token();
    if (ch == '-' || ch=='+') {
        expression();
        show(ch);
    }
    else 
        ungetc(ch, stdin);
}

int main(int argc, char **argv) {
    expression();
    return 0;
}

请注意，这里的词法分析非常愚蠢（它基本上只接受单个字符作为标记）和表达式允许的数量非常有限（仅+-*/）。 OTOH，它足以处理如下输入：

1+2*(3+4*(5/6)) ，

从中它确实会产生我认为正确的输出：

1 2 3 4 5 6 / * + * +

First of all, the grammar itself isn't top-down or bottom-up, the parser is (though there are grammars that can be parsed by one but not the other).

From a practical viewpoint, the main difference is that most hand-written parsers are top-down, while a much larger percentage of machine-generated parsers are bottom-up (though, of course, the reverse is certainly possible).

A top-down parser typically uses recursive descent, which typically means a structure something like this (using typical mathematical expressions as an example):

expression() { term() [-+] expression }
term() { factor() [*/] term() }
factor() { operand() | '(' expression() ')' }

A bottom-up parser work in the reverse direction -- where a recursive descent parser starts from the full expression, and breaks it down into smaller and smaller pieces until it reaches the level of individual tokens, a bottom-up parser starts from the individual tokens, and uses tables of rules about how those tokens fit together into higher and higher levels of the expression hierarchy until it reaches the top level (what's represented as "expression" above).

Edit: To clarify, perhaps it would make sense to add a really trivial parser. In this case, I'll just do the old classic of converting a simplified version of a typical mathematical expression from infix to postfix:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void expression(void);

void show(int ch) { 
    putchar(ch);
    putchar(' ');
}

int token() { 
    int ch;
    while (isspace(ch=getchar()))
        ;
    return ch;
}

void factor() { 
    int ch = token();
    if (ch == '(') {
        expression();
        ch = token();
        if (ch != ')') {
            fprintf(stderr, "Syntax error. Expected close paren, found: %c\n", ch);
            exit(EXIT_FAILURE);
        }
    }
    else
        show(ch);
}

void term() { 
    int ch;
    factor();
    ch = token();
    if (ch == '*' || ch == '/') {
        term();
        show(ch);
    }
    else
        ungetc(ch, stdin);
}

void expression() {
    int ch;
    term();
    ch = token();
    if (ch == '-' || ch=='+') {
        expression();
        show(ch);
    }
    else 
        ungetc(ch, stdin);
}

int main(int argc, char **argv) {
    expression();
    return 0;
}

Note that the lexing here is pretty stupid (it basically just accepts a single character as a token) and the expressions allowed are quite limited (only +-*/). OTOH, it's good enough to handle an input like:

1+2*(3+4*(5/6))

from which it does produce what I believe is correct output:

1 2 3 4 5 6 / * + * +

回复收藏 0 原文