语法:自上而下和自下而上的区别?

发布于 2024-09-07 22:36:48 字数 33 浏览 7 评论 0原文

自上而下和自下而上语法有什么区别?举个例子就太好了。

What is the difference between a top down and bottom up grammar? An example would be awesome.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

多情出卖 2024-09-14 22:36:48

首先,语法本身不是自上而下或自下而上的,解析器是(尽管有些语法可以被其中一个解析,但不能被另一个解析)。

从实践的角度来看,主要区别在于大多数手写解析器是自上而下的,而更大比例的机器生成的解析器是自下而上的(当然,相反也是可能的)。

自上而下的解析器通常使用递归下降,这通常意味着类似这样的结构(使用典型的数学表达式作为示例):

expression() { term() [-+] expression }
term() { factor() [*/] term() }
factor() { operand() | '(' expression() ')' }

自下而上的解析器以相反的方向工作 - 递归下降解析器从完整表达式开始,并将其分解为越来越小的部分,直到达到单个标记的级别,自下而上的解析器从单个标记开始,并使用有关这些标记如何组合成越来越高的表达式层次结构的规则表直到它到达顶层(上面表示为“表达式”)。

编辑:为了澄清,也许添加一个非常简单的解析器是有意义的。在这种情况下,我将执行旧的经典操作,将典型数学表达式的简化版本从中缀转换为后缀:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void expression(void);

void show(int ch) { 
    putchar(ch);
    putchar(' ');
}

int token() { 
    int ch;
    while (isspace(ch=getchar()))
        ;
    return ch;
}

void factor() { 
    int ch = token();
    if (ch == '(') {
        expression();
        ch = token();
        if (ch != ')') {
            fprintf(stderr, "Syntax error. Expected close paren, found: %c\n", ch);
            exit(EXIT_FAILURE);
        }
    }
    else
        show(ch);
}

void term() { 
    int ch;
    factor();
    ch = token();
    if (ch == '*' || ch == '/') {
        term();
        show(ch);
    }
    else
        ungetc(ch, stdin);
}

void expression() {
    int ch;
    term();
    ch = token();
    if (ch == '-' || ch=='+') {
        expression();
        show(ch);
    }
    else 
        ungetc(ch, stdin);
}

int main(int argc, char **argv) {
    expression();
    return 0;
}

请注意,这里的词法分析非常愚蠢(它基本上只接受单个字符作为标记)和表达式允许的数量非常有限(仅+-*/)。 OTOH,它足以处理如下输入:

1+2*(3+4*(5/6)) ,

从中它确实会产生我认为正确的输出:

1 2 3 4 5 6 / * + * +

First of all, the grammar itself isn't top-down or bottom-up, the parser is (though there are grammars that can be parsed by one but not the other).

From a practical viewpoint, the main difference is that most hand-written parsers are top-down, while a much larger percentage of machine-generated parsers are bottom-up (though, of course, the reverse is certainly possible).

A top-down parser typically uses recursive descent, which typically means a structure something like this (using typical mathematical expressions as an example):

expression() { term() [-+] expression }
term() { factor() [*/] term() }
factor() { operand() | '(' expression() ')' }

A bottom-up parser work in the reverse direction -- where a recursive descent parser starts from the full expression, and breaks it down into smaller and smaller pieces until it reaches the level of individual tokens, a bottom-up parser starts from the individual tokens, and uses tables of rules about how those tokens fit together into higher and higher levels of the expression hierarchy until it reaches the top level (what's represented as "expression" above).

Edit: To clarify, perhaps it would make sense to add a really trivial parser. In this case, I'll just do the old classic of converting a simplified version of a typical mathematical expression from infix to postfix:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void expression(void);

void show(int ch) { 
    putchar(ch);
    putchar(' ');
}

int token() { 
    int ch;
    while (isspace(ch=getchar()))
        ;
    return ch;
}

void factor() { 
    int ch = token();
    if (ch == '(') {
        expression();
        ch = token();
        if (ch != ')') {
            fprintf(stderr, "Syntax error. Expected close paren, found: %c\n", ch);
            exit(EXIT_FAILURE);
        }
    }
    else
        show(ch);
}

void term() { 
    int ch;
    factor();
    ch = token();
    if (ch == '*' || ch == '/') {
        term();
        show(ch);
    }
    else
        ungetc(ch, stdin);
}

void expression() {
    int ch;
    term();
    ch = token();
    if (ch == '-' || ch=='+') {
        expression();
        show(ch);
    }
    else 
        ungetc(ch, stdin);
}

int main(int argc, char **argv) {
    expression();
    return 0;
}

Note that the lexing here is pretty stupid (it basically just accepts a single character as a token) and the expressions allowed are quite limited (only +-*/). OTOH, it's good enough to handle an input like:

1+2*(3+4*(5/6))

from which it does produce what I believe is correct output:

1 2 3 4 5 6 / * + * +

深海少女心 2024-09-14 22:36:48

据我所知,它对语法本身没有任何影响,但对解析器有任何影响。

维基百科对 bottom-up自顶向下解析

一般来说(恕我直言)更直观的方法是自上而下。您从开始符号开始并应用适合的转换规则,而自下而上则需要向后应用转换规则(这通常让我很头疼)。

Afaik it doesn't make any difference for the grammar itself, but it does for the parser.

Wikipedia has a quite lengthy explanation of both bottom-up and top-down parsing.

Generally the (imho) more intuitive way is top-down. You start with the start symbol and apply the transformation rules that fit, while with bottom-up you need to apply transformation rules backwards (which usually created quite a headache for me).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文