是否有可能有一个语法,其中“关键字”是也可以被视为“非关键字”?

发布于 2024-09-25 12:00:43 字数 1182 浏览 11 评论 0原文

我在 ANTLRWorks 1.4 中有以下语法。我正在考虑在文本冒险游戏创建器中实现解析器的想法,其中用户将为他的游戏指定各种允许的命令。

grammar test;

parse       :   cmd EOF;


cmd         :   putSyn1 gameObject inSyn1 gameObject;

putSyn1     :   Put | Place | Drop ;

inSyn1      :   In | Into | Within;


gameObject  :   det obj;

det         :   The | A | An | ;

obj          :  Word obj | Word;


Space       :       (' ' | '\t' | '\r' | '\n'){$channel=HIDDEN;};
Put         :   'put';
Place       :   'place';
Drop        :   'drop';
In          :   'in';
Into        :   'into';
Within      :   'within';
The         :   'the';
A           :   'a';
An          :   'an';

Word        :   ('a'..'z' | 'A'..'Z')+;

我只是感受到所涉及的各种微妙之处(就像我所做的那样

这次,使用 ANTLR,我想知道是否可以解析输入,例如:

put wood in fire place

也就是说,“wood”和“fire place”是上面的游戏对象。然而,“地方”也是“放置”的同义词。所以这同样有效:

place wood in fire place

当尝试解析最后一个“place”标记时,ANTLR 给了我一个 NoViableAltException 。我想将“火场”识别为游戏对象。

那么这种事情在 ANTLR 中可能实现吗?语法上可以吗?

另一方面,我正在开发一个手动实现,它使用一种奇怪的自定义数据结构,其中包含 NFA、字典等内容。但我还需要更多的时间,必须牺牲一些脑细胞来设计所需的搜索和搜索。插入算法。

但如果这在 ANTLR 中是可能的,我就可以使用生成的 C# 文件,是吗?

I have the following grammar in ANTLRWorks 1.4. I'm playing around with ideas for implementation of a parser in a text-adventure game creator, where the user will specify the various allowable commands for his game.

grammar test;

parse       :   cmd EOF;


cmd         :   putSyn1 gameObject inSyn1 gameObject;

putSyn1     :   Put | Place | Drop ;

inSyn1      :   In | Into | Within;


gameObject  :   det obj;

det         :   The | A | An | ;

obj          :  Word obj | Word;


Space       :       (' ' | '\t' | '\r' | '\n'){$channel=HIDDEN;};
Put         :   'put';
Place       :   'place';
Drop        :   'drop';
In          :   'in';
Into        :   'into';
Within      :   'within';
The         :   'the';
A           :   'a';
An          :   'an';

Word        :   ('a'..'z' | 'A'..'Z')+;

I'm just getting a feel for the various subtleties involved (like I did here).

This time, using ANTLR, I'm wondering if I can parse input such as:

put wood in fire place

That is, "wood" and "fire place" are the gameObjects above. However, "place" is also a synonym for "put". So this would be equally valid:

place wood in fire place

ANTLR gives me a NoViableAltException when trying to parse the last "place" token. I want to recognize "fire place" as a gameObject.

So is this sort of thing possible in ANTLR? Is it possible in grammar?

On the side, I'm working on a manual implementation that uses a weird custom data structure with bits of NFA, Dictionary's and whatnot. But I still need more time and must sacrifice a few brain cells to design the required search & insertion algorithms.

But if this is possible in ANTLR, I could just use the generated C# file, yah?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蒲公英的约定 2024-10-02 12:00:43

当然。 PL/1 因没有任何保留字而闻名,例如,您可以在不需要作为关键字的任何地方使用关键字(例如 IF)作为变量名:

 IF  IF = 1  THEN  ELSE=3;  ELSE END=4;

构建执行此操作的解析器更难。您不能在词法分析器中“简单地”执行此操作,因为它不知道标识符可能是关键字的上下文,也可能不是关键字。

有几种出路。当找到像实体这样的标识符时:

1) 让词法分析器询问解析器,“你现在想要一个关键字吗?”。在这种情况下,请生成一个关键字。让解析器在这里合作可能很困难。解析器也可能不知道,因为它必须查看更多输入才能做出决定。考虑一下 Fortran 著名的格式语句:

     FORMAT ( A1, I2, ... ) X

当你看到“FORMAT”这个词时,你无法判断它是一个关键字还是一个标识符;你必须向前扫描任意远的距离来检查 X。如果 X 不是语句结尾,则 FORMAT 字是数组标识符的名称;如果 X 是语句结束,则它是 FORMAT 关键字和语句。

2) 发出关键字(如果标识符匹配)和标识符,并让解析器尝试两者。大多数解析器都不能很好地处理这个问题,但是如果设计合理,GLR 解析器可以轻松地处理这个问题。通过引入解析器的前瞻功能,可以轻松处理格式问题。 (ANTLR 不是 GLR。我们的 DMS 软件重新工程工具包正是这样一个 GLR解析器,我们经常使用这个技巧)。

3)将所有类似标识符的东西放入哈希表中。使用递归下降解析器(ANTLR 就是其中之一);当解析器需要一个关键字时,它只需检查它所获得的标识符以验证它是它需要的关键字。如果它不需要关键字,它只需使用标识符作为标识符。我不知道如何使用 ANTLR 来实现这个技巧,因为我不使用它。这不能很好地处理“没有前瞻就无法决定”的情况。

Sure. PL/1 is famous for not having any reserved words, e.g., you can use keywords (e.g., IF) as a variable name anywhere it isn't needed as a keyword:

 IF  IF = 1  THEN  ELSE=3;  ELSE END=4;

Building a parser that does this is harder. You can't do this "simply" in the lexer, because it doesn't know the context in which identifier might be a keyword, or not.

There are several ways out. When an identifier like entity is found:

1) Make the lexer ask the parser, " do you want a keyword now? ". In that case, produce a keyword. Getting the parser to cooperate here might be hard. It may also be that the parser doesn't know, because it has to see more input to decide. Consider Fortran's famous format statement:

     FORMAT ( A1, I2, ... ) X

You can't tell when you see the word "FORMAT" if it is a keyword, or an identifier; you have to scan ahead arbitrarily far to inspect X. If X is anything but a end of statement, the FORMAT word is the name of an array identifier; if X is end-of-statment, its a FORMAT keyword and statement.

2) Emit both a keyword (if the identifier matches one) and the identifier, and make the parser try both. Most parsers won't handle this well, but GLR parsers can handle this with aplomb if designed reasonably. This handles the FORMAT problem trivially by pushing into the parser's lookahead capability. (ANTLR isn't GLR. Our DMS Software Reengineering Toolkit has exactly such a GLR parser, and we use this trick a lot).

3) Place all identifier-like things into a hash table. Use a recursive descent parser (ANTLR is one); when that parser wants a keyword, it simply inspects the identifier it got to verify it is the keyword it needs. If it doesn't want a keyword, it simply uses the identifier as an identifier. I don't know how to implement this trick with ANTLR since I don't use it. This won't handle the "can't decide without lookahead" case well.

最终幸福 2024-10-02 12:00:43

我会使用词法分析器而不是解析器来处理类似的事情 - 让词法分析器执行“最大咀嚼”,因此它将“fire place”识别为单个标记,并且仅将“place”识别为单独的标记(如果它是)前面没有紧接着“火”。

这样,解析器就不必注意到输入中的相同字符序列恰好形成两个完全独立的标记的全部或部分。

I'd handle something like this with the lexer instead of the parser -- have the lexer do a "maximum munch", so it recognizes "fire place" as a single token, and only recognizes "place" as a separate token if it's not immediately preceded by "fire".

With that, the parser doesn't have to notice that the same sequence of characters in the input happen to form all or part of two entirely separate tokens.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文