互动式安特尔
我正在尝试使用antlr编写一种简单的交互式(使用System.in作为源)语言,但我遇到了一些问题。我在网上找到的例子都是使用每行循环,例如:
while(readline)
result = parse(line)
doStuff(result)
但是如果我正在编写类似 pascal/smtp/etc 的东西,并且“第一行”看起来像 X 需求,该怎么办?我知道它可以在 doStuff 中检查,但我认为从逻辑上讲它是语法的一部分。
或者如果一个命令被分成多行怎么办?我可以尝试
while(readline)
lines.add(line)
try
result = parse(lines)
lines = []
doStuff(result)
catch
nop
但是这样我也隐藏了真正的错误。
或者我可以每次都重新解析所有行,但是:
- 它会很慢
- ,有一些指令我不想运行两次
这可以用 ANTLR 来完成吗?如果不能,可以用其他东西来完成吗?
I'm trying to write a simple interactive (using System.in as source) language using antlr, and I have a few problems with it. The examples I've found on the web are all using a per line cycle, e.g.:
while(readline)
result = parse(line)
doStuff(result)
But what if I'm writing something like pascal/smtp/etc, with a "first line" looks like X requirment? I know it can be checked in doStuff, but I think logically it is part of the syntax.
Or what if a command is split into multiple lines? I can try
while(readline)
lines.add(line)
try
result = parse(lines)
lines = []
doStuff(result)
catch
nop
But with this I'm also hiding real errors.
Or I could reparse all lines everytime, but:
- it will be slow
- there are instructions I don't want to run twice
Can this be done with ANTLR, or if not, with something else?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
是的,ANTLR 可以做到这一点。也许不是开箱即用的,但通过一些自定义代码,这肯定是可能的。您也不需要为其重新解析整个令牌流。
假设您想要逐行解析一种非常简单的语言,其中每一行要么是一个
program
声明,要么是一个uses
声明,要么是一个statement.
它应该始终以
program
声明开头,后跟零个或多个uses
声明,然后是零个或多个statement
。uses
声明不能出现在statement
之后,并且不能有多个program
声明。为了简单起见,
语句
只是一个简单的赋值:a = 4
或b = a
。这种语言的 ANTLR 语法可能如下所示:
但是,我们当然需要添加一些检查。另外,默认情况下,解析器在其构造函数中接受令牌流,但由于我们计划在解析器中逐行滴入令牌,因此我们需要在解析器中创建一个新的构造函数。您可以通过将自定义成员放入
@parser::members { ... }
或@lexer::members { ... }
中来在词法分析器或解析器类中添加自定义成员> 部分分别。我们还将添加几个布尔标志来跟踪program
声明是否已经发生以及是否允许uses
声明。最后,我们将添加一个 process(String source) 方法,该方法为每个新行创建一个词法分析器,并将其提供给解析器。所有这些看起来像:
现在在我们的语法中,我们将检查几个 门控语义谓词(如果我们以正确的顺序解析声明)。在解析某个声明或语句后,我们需要翻转某些布尔标志以允许或禁止声明。这些布尔标志的翻转是通过每个规则的
@after { ... }
部分完成的,该部分在匹配来自该解析器规则的标记之后执行(毫不奇怪)。您的最终语法文件现在如下所示(包括一些用于调试目的的
System.out.println
):可以使用以下类进行测试:
操作:
要运行此测试类,请执行以下 你可以看到,你只能声明一个
program
一次:uses
不能出现在statement
s之后:并且你必须从一个
program开始声明:
Yes, ANTLR can do this. Perhaps not out of the box, but with a bit of custom code, it sure is possible. You also don't need to re-parse the entire token stream for it.
Let's say you want to parse a very simple language line by line that where each line is either a
program
declaration, or auses
declaration, or astatement
.It should always start with a
program
declaration, followed by zero or moreuses
declarations followed by zero or morestatement
s.uses
declarations cannot come afterstatement
s and there can't be more than oneprogram
declaration.For simplicity, a
statement
is just a simple assignment:a = 4
orb = a
.An ANTLR grammar for such a language could look like this:
But, we'll need to add a couple of checks of course. Also, by default, a parser takes a token stream in its constructor, but since we're planning to trickle tokens in the parser line-by-line, we'll need to create a new constructor in our parser. You can add custom members in your lexer or parser classes by putting them in a
@parser::members { ... }
or@lexer::members { ... }
section respectively. We'll also add a couple of boolean flags to keep track whether theprogram
declaration has happened already and ifuses
declarations are allowed. Finally, we'll add aprocess(String source)
method which, for each new line, creates a lexer which gets fed to the parser.All of that would look like:
Now inside our grammar, we're going to check through a couple of gated semantic predicates if we're parsing declarations in the correct order. And after parsing a certain declaration, or statement, we'll want to flip certain boolean flags to allow- or disallow declaration from then on. The flipping of these boolean flags is done through each rule's
@after { ... }
section that gets executed (not surprisingly) after the tokens from that parser rule are matched.Your final grammar file now looks like this (including some
System.out.println
's for debugging purposes):which can be tested wit the following class:
To run this test class, do the following:
As you can see, you can only declare a
program
once:uses
cannot come afterstatement
s:and you must start with a
program
declaration:下面是一个示例,说明如何解析来自 System.in 的输入,而无需首先手动逐行解析它,并且不会在语法上做出重大妥协。我正在使用 ANTLR 3.4。 ANTLR 4 可能已经解决了这个问题。不过,我仍在使用 ANTLR 3,也许其他人也遇到这个问题。
在讨论解决方案之前,我遇到了一些障碍,使这个看似微不足道的问题不容易解决:
CharStream
的内置 ANTLR 类消耗整个数据流。正面。显然交互模式(或任何其他不确定长度的流源)无法提供所有数据。考虑一个简单的例子:
交互式地解析单个
语句
(并且仅单个语句
)是不可能的。要么必须开始下一个语句
(即,在输入中点击“verb”),要么必须修改语法以标记语句的结束,例如使用' ;'
。skip()
替换我的$channel = HIDDEN
,但这仍然是一个值得一提的限制。例如,我的语法的正常入口点是以下规则:
我的交互式会话无法从
script
规则开始,因为它直到EOF
才会结束。但它也不能从statement
开始,因为STMTS
可能会被我的树解析器使用。因此,我专门针对交互式会话引入了以下规则:
就我而言,没有“第一行”规则,因此我无法说为他们做类似的事情有多容易或多困难。这可能是制定这样的规则并在交互会话开始时执行它的问题:
提到的第一个问题是内置
CharStream
类的限制,这是我唯一的主要障碍。ANTLRStringStream
具有我需要的所有功能,因此我从中派生了自己的CharStream
类。假设基类的data
成员读取了所有过去的字符,因此我需要重写所有访问它的方法。然后我将直接读取更改为对(新方法)dataAt
的调用来管理从流中的读取。这基本上就是全部内容了。请注意,这里的代码可能存在未被注意到的问题,并且没有进行真正的错误处理。启动交互式会话与样板解析代码类似,不同之处在于使用了
UnbufferedTokenStream
并且解析在循环中进行:还在我身边吗?好吧,就这样吧。 :)
Here's an example of how to parse input from
System.in
without first manually parsing it one line at a time and without making major compromises in the grammar. I'm using ANTLR 3.4. ANTLR 4 may have addressed this problem already. I'm still using ANTLR 3, though, and maybe someone else with this problem still is too.Before getting into the solution, here are the hurdles I ran into that keeps this seemingly trivial problem from being easy to solve:
CharStream
consume the entire stream of data up-front. Obviously an interactive mode (or any other indeterminate-length stream source) can't provide all the data.BufferedTokenStream
and derived class(es) will not end on a skipped or off-channel token. In an interactive setting, this means that the current statement can't end (and therefore can't execute) until the first token of the next statement orEOF
has been consumed when using one of these classes.Consider a simple example:
Interactively parsing a single
statement
(and only a singlestatement
) isn't possible. Either the nextstatement
has to be started (that is, hitting "verb" in the input), or the grammar has to be modified to mark the end of the statement, e.g. with a';'
.$channel = HIDDEN
withskip()
, but it's still a limitation worth mentioning.For example, my grammar's normal entry point is this rule:
My interactive session can't start at the
script
rule because it won't end untilEOF
. But it can't start atstatement
either becauseSTMTS
might be used by my tree parser.So I introduced the following rule specifically for an interactive session:
In my case, there are no "first line" rules, so I can't say how easy or hard it would be to do something similar for them. It may be a matter of making a rule like so and execute it at the beginning of the interactive session:
The first problem mentioned, the limitations of the built-in
CharStream
classes, was my only major hang-up.ANTLRStringStream
has all the workings that I need, so I derived my ownCharStream
class off of it. The base class'sdata
member is assumed to have all the past characters read, so I needed to override all the methods that access it. Then I changed the direct read to a call to (new method)dataAt
to manage reading from the stream. That's basically all there is to this. Please note that the code here may have unnoticed problems and does no real error handling.Launching an interactive session is similar to the boilerplate parsing code, except that
UnbufferedTokenStream
is used and the parsing takes place in a loop:Still with me? Okay, well that's it. :)
如果您使用 System.in 作为源(即输入流),为什么不让 ANTLR 在读取输入流时对其进行标记,然后解析标记呢?
If you are using System.in as source, which is an input stream, why not just have ANTLR tokenize the input stream as it is read and then parse the tokens?
你必须把它放在 doStuff 中......
例如,如果你声明一个函数,解析会返回一个函数,对吧?没有身体,所以,没关系,因为身体稍后会来。你会做大多数 REPL 所做的事情。
You have to put it in doStuff....
For instance, if you're declaring a function, the parse would return a function right? without body, so, that's fine, because the body will come later. You'd do what most REPL do.