如何让 Bison/YACC 在解析整个字符串之前不识别命令？

发布于 2024-08-27 19:10:39 字数 733 浏览 8 评论 0原文

我有一些野牛语法：

input: /* empty */
       | input command
;

command:
        builtin
        | external
;

builtin:
        CD { printf("Changing to home directory...\n"); }
        | CD WORD { printf("Changing to directory %s\n", $2); }
;

我想知道如何让 Bison 不接受（YYACCEPT？）某些内容作为命令，直到它读取所有输入。因此，我可以使用下面的所有这些规则来使用递归或其他任何方式来构建事物，这要么会导致有效的命令，要么会导致无法正常工作。

我使用上面的代码进行的一项简单测试就是输入“cd mydir mydir”。 Bison 解析 CD 和 WORD 并说“嘿！这是一个命令，把它放在顶部！”。然后它找到的下一个标记只是WORD，它没有规则，然后它报告错误。

我希望它读取整行并意识到 CD WORD WORD 不是规则，然后报告错误。我想我错过了一些明显的东西，非常感谢任何帮助 - 谢谢！

另外 - 我尝试使用输入命令 NEWLINE 或类似的命令，但它仍然将 CD WORD 作为命令推送到顶部，然后解析额外的 WORD< /代码> 单独。

原文

I have some bison grammar:

input: /* empty */
       | input command
;

command:
        builtin
        | external
;

builtin:
        CD { printf("Changing to home directory...\n"); }
        | CD WORD { printf("Changing to directory %s\n", $2); }
;

I'm wondering how I get Bison to not accept (YYACCEPT?) something as a command until it reads ALL of the input. So I can have all these rules below that use recursion or whatever to build things up, which either results in a valid command or something that's not going to work.

One simple test I'm doing with the code above is just entering "cd mydir mydir". Bison parses CD and WORD and goes "hey! this is a command, put it to the top!". Then the next token it finds is just WORD, which has no rule, and then it reports an error.

I want it to read the whole line and realize CD WORD WORD is not a rule, and then report an error. I think I'm missing something obvious and would greatly appreciate any help - thanks!

Also - I've tried using input command NEWLINE or something similar, but it still pushes CD WORD to the top as a command and then parses the extra WORD separately.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

岁月蹉跎了容颜 2024-09-03 19:10:39

有时我会通过简化语法来处理这些情况。

在您的情况下，将标记添加到词法分析器中以用于换行符和命令分隔符 (;) 可能是有意义的，这样您就可以显式地将它们放入您的 Bison 语法中，这样解析器将在接受命令之前期望命令的完整行输入命令。

sep:   NEWLINE | SEMICOLON
   ;

command:  CD  sep
   |  CD WORD sep
   ;

或者，对于像真实 shell 一样的任意参数列表：

args:
    /* empty */
  | args WORD
  ;

command:
      CD args sep
   ;

Sometimes I deal with these cases by flattening my grammars.

In your case, it might make sense to add tokens to your lexer for newline and command separators (;) so you can explicitly put them in your Bison grammar, so the parser will expect a full line of input for a command before accepting as a commmand.

sep:   NEWLINE | SEMICOLON
   ;

command:  CD  sep
   |  CD WORD sep
   ;

Or, for an arbitrary list of arguments like a real shell:

args:
    /* empty */
  | args WORD
  ;

command:
      CD args sep
   ;

回复收藏 0 原文

潇烟暮雨 2024-09-03 19:10:39

与其直接调用操作，不如先为自己构建一个抽象语法树。然后根据结果和您的偏好，您要么执行其中的一部分，要么什么也不执行。如果在树构建过程中出现解析错误，您可能需要使用 %destructor 指令来告诉 bison 如何进行清理。

这实际上是一种正确的方法，因为您可以完全控制内容和逻辑，并且让 bison 负责解析。

回复收藏 0 原文

五里雾 2024-09-03 19:10:39

通常，事情不会按照您描述的方式完成。

对于 Bison/Yakk/Lex，人们通常会仔细设计其语法以准确地完成他们的需要。因为 Bison/Yakk/Lex 天生贪婪它们的正则表达式，这应该对你有帮助。

那么，这个怎么样？

由于您一次解析整行，我认为我们可以利用这一事实来修改语法。

input : /* empty */
      | line


command-break : command-break semi-colon
              | semi-colon

line : commands new-line

commands : commands command-break command
         | commands command-break command command-break
         | command
         | command command-break

...

其中 new-line, 'semi-colon在lex源代码中定义为\n,\ t`。这将为您提供所需命令的 UNIX 风格语法。各种各样的事情都是可能的，并且它有点臃肿，允许多个分号并且不考虑空格，但您应该明白这个想法。

Lex 和 Yakk 是一个强大的工具，我发现它们非常令人愉快 - 至少在没有截止日期的情况下。

Usually, things aren't done the way you describe.

With Bison/Yakk/Lex, one usually carefully designs their syntax to do exactly what they need. Because Bison/Yakk/Lex are naturally greedy with their regular expressions, this should help you.

So, how about this instead.

Since you are parsing whole lines at a time, I think we can use this fact to our advantage and revise the syntax.

input : /* empty */
      | line


command-break : command-break semi-colon
              | semi-colon

line : commands new-line

commands : commands command-break command
         | commands command-break command command-break
         | command
         | command command-break

...

Where new-line, 'semi-colonis defined in yourlexsource as something like\n,\t` . This should give you the UNIX-style syntax for commands that you are looking for. All sorts of things are possible, and it is a little bloated allowing for multiple semicolons and doesn't take in consideration white-space, but you should get the idea.

Lex and Yakk are a powerful tool, and I find them quite enjoyable - at least, when you aren't on a deadline.

回复收藏 0 原文