使用 Java Scanner 库解析此内容的最有效方法是什么?

发布于 2024-09-14 08:37:09 字数 631 浏览 6 评论 0原文

我正在尝试使用 Java 的 Scanner 库解析大文件的一部分,但我很难确定解析此文本的最佳路径。

SECTOR 199
FLAGS 0x1000
AMBIENT LIGHT 0.67
EXTRA LIGHT 0.00
COLORMAP 0
TINT 0.00 0.00 0.00
BOUNDBOX 7.399998 8.200002 6.199998 9.399998 8.500000 7.099998
COLLIDEBOX 7.605121 8.230770 6.200000 9.399994 8.469233 7.007693
CENTER 8.399998 8.350001 6.649998
RADIUS 1.106797
VERTICES 12
0: 1810
1: 1976
2: 1977
3: 1812
4: 1978
5: 1979
6: 1820
7: 1980
8: 1821
9: 1981
10: 1982
11: 1811
SURFACES 1893 8

它有一些可选字段(SOUND、COLLIDEBOX),因此我无法像处理文件前一部分那样按特定顺序进行解析。我不确定如何在不使其效率非常低的情况下执行此操作,目前我一直在考虑解析每一行,然后用 String.split("\s+") 分割它以获取值,但我我很好奇我还有什么其他选择。 :\

I'm trying to parse a section of a large file with Java's Scanner library, but I'm having a hard time trying to determine the best route to parse this text.

SECTOR 199
FLAGS 0x1000
AMBIENT LIGHT 0.67
EXTRA LIGHT 0.00
COLORMAP 0
TINT 0.00 0.00 0.00
BOUNDBOX 7.399998 8.200002 6.199998 9.399998 8.500000 7.099998
COLLIDEBOX 7.605121 8.230770 6.200000 9.399994 8.469233 7.007693
CENTER 8.399998 8.350001 6.649998
RADIUS 1.106797
VERTICES 12
0: 1810
1: 1976
2: 1977
3: 1812
4: 1978
5: 1979
6: 1820
7: 1980
8: 1821
9: 1981
10: 1982
11: 1811
SURFACES 1893 8

It has some optional fields that(SOUND, COLLIDEBOX), so I can't parse in a particular order like I've been doing with the previous part of the file. I'm unsure how to go about doing this without making it terribly inefficient, at the moment I've been thinking about parsing each line, then splitting it with the String.split("\s+") to get the values, but I'm curious what other options I may have. :\

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

所有深爱都是秘密 2024-09-21 08:37:19

如果文件很大,我建议你可以使用java.io.RandomAccessFile,它可以跳过你想要解析的任何区域,而且速度非常快。如果将整个文件映射到内存中,可能会减慢应用程序的速度。

也可以使用java.util.StringTokenizer来分割简单的大小写。例如,空格,逗号等。它比正则表达式更快。

If the file is very big,I suggest that you can use java.io.RandomAccessFile,it can skip any area that you want to parse and it's very fast. If you map whole file into memnory, it may slow down you application.

It's alternative to use java.util.StringTokenizer to split simple case.For example, white space,comma and so on. It's more faster than regular expression.

奶茶白久 2024-09-21 08:37:17

这种方法怎么样:

find next command (SECTOR, FLAGS, AMBIENT LIGHT, EXTRA LIGHT, etc)
no command found? -> output error and stop
map to command implementation 
execute command (pass it the scanner and your state holder)
command impl handles specific reading of arguments
rinse, repeat,...

您必须创建一个 Command 接口:

public interface Command {
    String getName();
    void execute(Scanner in, ReadState state);
}

以及您可能遇到的每种类型的命令的单独实现:

public class SectorCommand implements Command {
    public String getName() {
        return "SECTOR";
    }
    public void execute(Scanner in, ReadState state) {
        state.setSector(in.nextInt());
    }
}

以及某种工厂来查找命令:(

public class CommandFactory {

    private Map<String, Command> commands;
    public CommandFactory() {
        commands = new HashMap<String, Command>();
        addCommand(new SectorCommand());
        // add other commands
    }
    public Command findCommand(Scanner in) {
        for (Map.Entry<String, Command> entry : commands.entrySet()) {
            if (in.findInLine(entry.getKey())) {
                return commands.get(entry.getValue);
            }
        }
        throw new IllegalArgumentException("No command found");
    }
    private void addCommand(Command command) {
        commands.put(command.getName(), command); 
    }
}

此代码可能无法编译)

How about this approach:

find next command (SECTOR, FLAGS, AMBIENT LIGHT, EXTRA LIGHT, etc)
no command found? -> output error and stop
map to command implementation 
execute command (pass it the scanner and your state holder)
command impl handles specific reading of arguments
rinse, repeat,...

You will have to create a Command interface:

public interface Command {
    String getName();
    void execute(Scanner in, ReadState state);
}

and a separate implementation of it for each type of command you can encounter:

public class SectorCommand implements Command {
    public String getName() {
        return "SECTOR";
    }
    public void execute(Scanner in, ReadState state) {
        state.setSector(in.nextInt());
    }
}

and of some sort of factory to find commands:

public class CommandFactory {

    private Map<String, Command> commands;
    public CommandFactory() {
        commands = new HashMap<String, Command>();
        addCommand(new SectorCommand());
        // add other commands
    }
    public Command findCommand(Scanner in) {
        for (Map.Entry<String, Command> entry : commands.entrySet()) {
            if (in.findInLine(entry.getKey())) {
                return commands.get(entry.getValue);
            }
        }
        throw new IllegalArgumentException("No command found");
    }
    private void addCommand(Command command) {
        commands.put(command.getName(), command); 
    }
}

(this code may not compile)

旧话新听 2024-09-21 08:37:16

我首先使用关键字定义一个枚举,例如:

 public enum Keyword {SECTOR, FLAGS, AMBIENT, EXTRA, COLORMAP, TINT, 
    BOUNDBOX, COLLIDEBOX, CENTER, RADIUS, VERTICES, SURFACES}

可以逐行进行解析,在空白字符处拆分。然后,我将第一个元素转换为 Keyword 类中的枚举,并使用简单的 switch 构造来处理这些值:

public Model parse(List<String> lines) {

   Model model = new Model();

   Iterator<String> it = lines.iterator();
   while(it.hasNext()) {
      String[] elements = it.next().split("\s+");

      switch(Keyword.valueOf(elements[0])) {
        case SECTOR: model.addSector(elements[1]); break;
        case FLAGS: model.addFlags(elements[1]); break;
        // ...
        case VERTICES:
          int numberOfVertices = Integer.parseInt(elements[1]);
          for (int i = 0; i < numberOfVertices; i++) {
             elements = it.next().split("\s+");
             model.addVertice(i, elements[1]);
          }
          break;
        case default:
          // handle malformed line

      }
   }
   return model;
}

I'd first define an enum with the keywords, like:

 public enum Keyword {SECTOR, FLAGS, AMBIENT, EXTRA, COLORMAP, TINT, 
    BOUNDBOX, COLLIDEBOX, CENTER, RADIUS, VERTICES, SURFACES}

Parsing can be done line by line, splitting at whitespace chars. Then I'd convert the first element to an enum from the Keyword class and use a simple switch construct to handle the values:

public Model parse(List<String> lines) {

   Model model = new Model();

   Iterator<String> it = lines.iterator();
   while(it.hasNext()) {
      String[] elements = it.next().split("\s+");

      switch(Keyword.valueOf(elements[0])) {
        case SECTOR: model.addSector(elements[1]); break;
        case FLAGS: model.addFlags(elements[1]); break;
        // ...
        case VERTICES:
          int numberOfVertices = Integer.parseInt(elements[1]);
          for (int i = 0; i < numberOfVertices; i++) {
             elements = it.next().split("\s+");
             model.addVertice(i, elements[1]);
          }
          break;
        case default:
          // handle malformed line

      }
   }
   return model;
}
岁吢 2024-09-21 08:37:14

输入看起来足够复杂,足以需要一个完整的解析器。我建议使用 ANTLR 等库( http://www.antlr.org/ )。

The input looks like it is complex enough to warrent an full blown parser. I would recommend to use a library such as ANTLR ( http://www.antlr.org/ ).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文