语法树

发布于 2024-10-06 15:25:11 字数 311 浏览 0 评论 0原文

我在大学里有一个实验室。但我不明白我该怎么做。 我有某种语言的语法(例如算术表达式语法)。我必须建立这个语法树(我不知道如何)。那么我必须确定,输入的句子是否是该语言的句子?我该怎么做呢? PS我读过《龙之书》的几章,但没有找到我需要的东西。 PPS 我不能使用 lex/flex 和 yacc/bison。


编辑抱歉!我应该更加细心。实际上,我有一个语法,如果可能的话,我必须使用这个语法构建树并输入句子(我知道如何做到这一点)(在另一种情况下,我必须显示有关它的消息)。有什么简单易懂的算法吗?我可以使用任何简单语言的语法,例如算术表达式。

I have a lab in the university. But I don't understand how I can do it.
I have a grammar of some language(for example, arithmetic expressions grammar). I must build tree of this grammar(I don't know how). Then I must determine, whether input sentence is sentence of this language? How can I do it?
P.S. I've read few chapters of Dragon Book, but I havn't found anything I need.
P.P.S. I can't use lex/flex and yacc/bison.


EDIT Sorry! I shuild be more attentive. Really I have a grammar and I must build tree using this grammar and input sentence(I understand how to do it) if it possible(in another case I must show message about it). Is there any simple for understanding algorithms for it? I can use grammar of any simple language like arithmetical expressions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

稍尽春風 2024-10-13 15:25:11

我假设这是理论上的,因为它是家庭作业,而不是您此时实际需要编码的东西。 (Kendrick 的答案涵盖了代码方法)

基本思想是从 BNF 起始变量开始,并尝试找出如何扩展它,一次应用一个规则,看看是否可以得出输入序列。

对于如下所示的规则集:

(1) start: expression

(2) expression: expression '+' term
(3)           | expression '-' term
(4)           | term

(5) term: 'a'
(6)     | 'b'

给定表达式 a + b - a,您将执行如下操作

开始(给定)
b. 表达式 (1)
c. 表达式“-”项 (3)
d. 表达式 '-' 'a' (5)
e. 表达式 '+' 项 '-' 'a' (2)
f. 术语“+”术语“-”“a” (4)
g。 'a' '+' 项 '-' 'a' (5)
h. 'a' '+' 'b' '-' 'a' (6)

这就是您一次一步执行的方法...现在的诀窍是,您需要追踪所有你的规则调用。所以你真正的树看起来像这样:

                          start
                            | (b)
                        expression
                        /   |   \ (c)
               expression  '-'  term
                /  |  \ (e)       | (d)
       expression '+' term       'a'
           | (f)        | (h)
          term         'b'
           | (g)
          'a'

一开始有点复杂,但是一旦你真正看到它是如何完成的,就不难上手了。

注意:有些人发现逆向工作更容易,从您的输入开始,然后反向应用规则来尝试找到您的起始表达式。当您编写解析器时,您将不可避免地需要在某种程度上遵循此路线。

编辑:我在上面使用小写字母列出了所有表达式步骤,然后表明树中的每组分支实际上对应于一个规则应用程序。可能有帮助,也可能没有帮助。

I'm assuming this is theoretical since it's homework, and not something you actually need to code at this point. (Kendrick's answer covers the code approach)

The basic idea is to start with your BNF start variable and try to figure out how to expand it, applying rules one at a time, to see if you can come up with your input sequence.

For a ruleset like the following:

(1) start: expression

(2) expression: expression '+' term
(3)           | expression '-' term
(4)           | term

(5) term: 'a'
(6)     | 'b'

Given the expression a + b - a, you would go something like this:

a. start (given)
b. expression (1)
c. expression '-' term (3)
d. expression '-' 'a' (5)
e. expression '+' term '-' 'a' (2)
f. term '+' term '-' 'a' (4)
g. 'a' '+' term '-' 'a' (5)
h. 'a' '+' 'b' '-' 'a' (6)

So that's how you do it one step at a time... Now the trick is, you need to trace out all your rule invocations. So your real tree would look something like this:

                          start
                            | (b)
                        expression
                        /   |   \ (c)
               expression  '-'  term
                /  |  \ (e)       | (d)
       expression '+' term       'a'
           | (f)        | (h)
          term         'b'
           | (g)
          'a'

It's a little complicated at first, but once you actually see how it's done it's not too hard to pick up.

Note: Some people find it easier to work backwards, starting with your input and then applying rules in reverse to try to find your start expression. When you write a parser, you will inevitably need to follow this route on some level.

EDIT: I listed all the expression steps using lowercase letters above, and then showed that each set of branches from the tree actually corresponds with one of the rule applications. May or may not help.

过去的过去 2024-10-13 15:25:11

我已经很长时间没有这样做了,所以我将给你一个简短的理论答案。我相信其他人会有更深入的解释。

由于您还没有说出您做了什么(并且您应该将其作为将来问题的一部分),因此第一步是对输入进行标记。

获得令牌后,您可以构建一个状态机来遍历令牌并将它们推送到所需的树结构中。这应该在你的书中涵盖(我还没有读过),但你可能想从在纸质(或电子)上构建模型开始,并考虑每个步骤的有效输入。相等语句的有效左手值是多少?如果你在树上有令牌 x,你可以从这里去哪里?

如果您无法从当前状态确定下一步该去哪里,那么您可能需要在状态机中实现前瞻(是否需要这取决于您的语言的复杂性)。

再说一次,我已经有十多年没有这样做了,所以我的记忆很模糊。希望我已经为您提供了足够的框架来完善您的答案,或者为其他对此主题更了解的人提供一个框架来批评我的错误或过时(从而得出您作为一个团队正在寻找的答案) )。

It's been a long time since I've done this, so I'm going to give you the short theoretical answer. I'm sure someone else will have a more in-depth explaination.

Since you haven't said what you've done (and you should as part of your questions in the future), the first step is to tokenize the input.

Once you have the tokens, you can build a state machine to walk the tokens and push them into the desired tree structure. This should be covered in your book (I haven't read it) but you probably want to start by building a model on paper (or electronic) and consider valid inputs from each step. What's a valid left hand value for an equality statement? If you have token x on the tree, where can you go from here?

If you can't determine from the current state where to go next, then you may need to implement look-ahead in your state machine (whether this is required depends on the complexity of your language).

Again, I haven't done this in over 10 years, so my recollection is fuzzy. Hopefully I've given enough of a framework for you to refine your answer or give others who are more knowledgeable on the subject to roast me for being wrong or out of date (and thus arriving at the answer you're looking for as a group).

菊凝晚露 2024-10-13 15:25:11

我想回答这个问题,但实际上没有足够的材料可供使用 - 我留下了这些问题:

  • 由于这是一个大学实验室,所以没有附带的学习材料或课程吗?另外,你还有多少时间?
  • 到目前为止你尝试过什么?
  • 您能否将给定的语法硬编码到您的程序中,或者您的程序是否需要能够从文件中读取任何语法并从中构建解析树?
  • 语法有多复杂;具体来说:它们是递归的吗?
  • 语法标记是单字符还是多字符?

编辑:考虑到评论和更新的问题:我认为最直接的方法是递归下降解析器,它在输入匹配时构建树。查找 Niklaus Wirth 的“编译器构建”一书,该书已在网络上以 PDF 形式提供,其中描述了一种简单语言的 RDP。

递归下降解析器的基本思想是语法中的每个规则都对应一个函数,每个函数检查下一个标记并调用相应的下一个解析函数。例如,如果您的语法有规则

表达式:常量'+'常量

则关联的解析方法将如下所示:

void parseExpression() {
    parseConstant();
    Token token = peek_at_next_token();
    if (token == '+') {
        parseConstant();
        ... tree building code here ...
    }
    else
        throw ParseException("Expected '+', found "+token);
}

但另请参阅此问题的其他答案。

I'd like to answer this, but there isn't really enough material to work with - I am left with these questions:

  • As this is a university lab, isn't there accompanying study material or classes? Also, how much time to do you have?
  • What have you tried so far?
  • Can you hard-code a given grammar into your program, or does your program need to be able to read any grammar from a file and build a parse-tree from that?
  • How complex are the grammars; specifically: are they recursive?
  • Are the grammar tokens single-character or multi-character?

Edit: Taking the comment and the updated question into account: I think the most straight-forward approach would be a recursive descent parser, which builds the tree as the input is matched. Look for Niklaus Wirth's "Compiler Construction" book, which has been made available as PDF on the web, and which describes a RDP for a simple language.

The basic idea of a recursive descent parser is that every rule in the grammar corresponds to a function, and each function checks the next token and calls the corresponding next-lower parse function. For example, if your grammar has a rule

expression : constant '+' constant

the associated parsing method would look like this:

void parseExpression() {
    parseConstant();
    Token token = peek_at_next_token();
    if (token == '+') {
        parseConstant();
        ... tree building code here ...
    }
    else
        throw ParseException("Expected '+', found "+token);
}

But see also the other answers to this question.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文