ANTLR 2.7 从解析器获取对象流

发布于 11-26 18:19 字数 847 浏览 5 评论 0原文

我正在使用 ANTLR 2.7.6 来解析另一个应用程序的混乱输出。遗憾的是,我无法升级到 ANTLR 3,尽管它已经发布了一段时间了。我将要解析的日志文件最好被概念化为对象列表,而不是对象树,并且可能非常大(> 100 MB),因此将其全部读入一个 AST 是不切实际的。 (我的应用程序是多线程的,将一次处理六到十几个这样的文件,因此内存很快就会填满。)我希望能够从流中读出这些对象中的每一个,以便我可以处理它们一个。请注意,对象本身可以概念化为小树。有没有办法让我的 ANTLR 解析器像对象流、迭代器或类似的东西一样工作?

[请参阅 ANTLR 2 的 Javadoc。]

编辑:这是我想要使用解析器执行的操作的概念示例。

import java.io.FileReader;
import antlr.TokenStream;
import antlr.CharBuffer;
//...
FileReader fileReader = new FileReader(filepath);
TokenStream lexer = new MyExampleLexer(new CharBuffer(fileReader));
MyExampleParser parser = new MyExampleParser(lexer);
for (Object obj : parser)
{
    processObject(obj);
}

我是否可能使用了错误的 Antlr 解析器范例? (我意识到解析器没有实现 Iterator;但从概念上讲,这正是我正在寻找的行为。)

I'm using ANTLR 2.7.6 to parse the messy output of another application. Sadly, I do not have the ability to upgrade to ANTLR 3, even though it has been out for quite a while. A log file of the sort I will be parsing is better conceptualized as a list of objects than a tree of objects, and could be very large (>100 MB) so it is not practical to read it all into one AST. (My application is multithreaded and will process half a dozen to a dozen of these files at once, so memory will fill up quick.) I want to be able to read out each of these objects as from a stream so I can process them one by one. Note that the objects themselves could be conceptualized as small trees. Is there a way to get my ANTLR parser to act like an object stream, an iterator, or something of that nature?

[See Javadoc for ANTLR 2.]

Edit: Here is a conceptual example of what I would like to do with the parser.

import java.io.FileReader;
import antlr.TokenStream;
import antlr.CharBuffer;
//...
FileReader fileReader = new FileReader(filepath);
TokenStream lexer = new MyExampleLexer(new CharBuffer(fileReader));
MyExampleParser parser = new MyExampleParser(lexer);
for (Object obj : parser)
{
    processObject(obj);
}

Am I perhaps working with the wrong paradigm of how to use an Antlr parser? (I realize that the parser does not implement Iterator; but that is conceptually the sort of behavior I'm looking for.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

谁许谁一生繁华2024-12-03 18:19:29

AFAIK,ANTLR v2.x 缓冲令牌的创建。 解析器采用 TokenBuffer,其中 < a href="http://www.antlr2.org/javadoc/antlr/TokenBuffer.html#TokenBuffer%28antlr.TokenStream%29" rel="nofollow">轮到它需要一个 TokenStream。然后通过其 nextToken 轮询此 TokenStream () 方法 当解析器需要更多标记时。

换句话说,如果您将输入源作为文件提供,ANTLR 不会读取整个文件并创建它的标记,而是仅在需要时创建(并丢弃)标记。

请注意,我从未使用过 ANTLR 2.x,所以我可能是错的。你观察到一些不同的东西吗?如果是这样,您如何向 ANTLR 提供源:作为文件,还是作为大字符串?如果是后者,我建议提供一个文件。

编辑

假设您要解析一个由数字行组成的文件,由空格分隔(您想忽略)。您还希望解析器逐行处理文件,因为一次收集所有数字会导致内存问题。

您可以通过让主解析器规则 parse 返回每行的数字列表来完成此操作。如果到达 EOF(文件结尾),您只需返回 null 而不是列表。

使用 ANTLR 2.7.6 的演示:

文件:My.g

class MyParser extends Parser;

parse returns [java.util.List<Integer> numbers]
{
  numbers = new java.util.ArrayList<Integer>();
}
  :  (n:Number {numbers.add(Integer.valueOf(n.getText()));})+ LineBreak
  |  EOF {numbers = null;}
  ;

class MyLexer extends Lexer; 

Number
  :  ('0'..'9')+
  ;

LineBreak
  :  ('\r')? '\n'
  ;

Space
  :  (' ' | '\t') {$setType(Token.SKIP);}
  ;

文件:Main.java

import antlr.*;

public class Main {
  public static void main(String[] args) throws Exception {
    MyLexer lexer = new MyLexer(new java.io.StringReader("1 2 3\n4 5 6 7 8\n9 10\n"));
    MyParser parser = new MyParser(new TokenBuffer(lexer));
    int line = 0;
    java.util.List<Integer> numbers = null;
    while((numbers = parser.parse()) != null) {
      line++;
      System.out.println("line " + line + " = " + numbers);
    }
  }
}

上运行演示

要在*nix

java -cp antlr-2.7.6.jar antlr.Tool My.g
javac -cp antlr-2.7.6.jar *.java
java -cp .:antlr-2.7.6.jar Main

Windows

java -cp antlr-2.7.6.jar antlr.Tool My.g
javac -cp antlr-2.7.6.jar *.java
java -cp .;antlr-2.7.6.jar Main

,将产生以下输出:

line 1 = [1, 2, 3]
line 2 = [4, 5, 6, 7, 8]
line 3 = [9, 10]

警告

任何尝试此代码的人,请注意这使用 ANTLR 2.7 .6.除非您有非常令人信服的理由使用此版本,否则强烈建议使用 ANTLR 的最新稳定版本(撰写本文时为 v3.3)。

AFAIK, ANTLR v2.x buffers the creating of tokens. The parser takes a TokenBuffer, which in its turn takes a TokenStream. This TokenStream is then polled through its nextToken() method when the parser needs more tokens.

In other words, if you provide the input source as a file, ANTLR does not read the entire file and create tokens of it, but only when needed are tokens created (and discarded).

Note that I never worked with ANTLR 2.x, so I could be wrong. Have you observed something different? If so, how do you offer the source to ANTLR: as a file, or as a big string? If it's the latter, I recommend providing a file instead.

EDIT

Let's say you want to parse a file that consists of lines with numbers, delimited by white spaces (which you want to ignore). You also want your parser to process the file line by line because collecting all numbers at once would result in memory problems.

You can do this by letting your main parser rule, parse, return a list of numbers for each line. If the EOF (end-of-file) is reached, you simply return null instead of a list.

A demo using ANTLR 2.7.6:

file: My.g

class MyParser extends Parser;

parse returns [java.util.List<Integer> numbers]
{
  numbers = new java.util.ArrayList<Integer>();
}
  :  (n:Number {numbers.add(Integer.valueOf(n.getText()));})+ LineBreak
  |  EOF {numbers = null;}
  ;

class MyLexer extends Lexer; 

Number
  :  ('0'..'9')+
  ;

LineBreak
  :  ('\r')? '\n'
  ;

Space
  :  (' ' | '\t') {$setType(Token.SKIP);}
  ;

file: Main.java

import antlr.*;

public class Main {
  public static void main(String[] args) throws Exception {
    MyLexer lexer = new MyLexer(new java.io.StringReader("1 2 3\n4 5 6 7 8\n9 10\n"));
    MyParser parser = new MyParser(new TokenBuffer(lexer));
    int line = 0;
    java.util.List<Integer> numbers = null;
    while((numbers = parser.parse()) != null) {
      line++;
      System.out.println("line " + line + " = " + numbers);
    }
  }
}

To run the demo on:

*nix

java -cp antlr-2.7.6.jar antlr.Tool My.g
javac -cp antlr-2.7.6.jar *.java
java -cp .:antlr-2.7.6.jar Main

or on:

Windows

java -cp antlr-2.7.6.jar antlr.Tool My.g
javac -cp antlr-2.7.6.jar *.java
java -cp .;antlr-2.7.6.jar Main

which will produce the following output:

line 1 = [1, 2, 3]
line 2 = [4, 5, 6, 7, 8]
line 3 = [9, 10]

Warning

Anyone trying this code, please note that this uses ANTLR 2.7.6. Unless you have a very compelling reason to use this version, it is highly recommended to use the latest stable version of ANTLR (v3.3 at the time of this writing).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文