使用 Java Scanner 库解析此内容的最有效方法是什么?
我正在尝试使用 Java 的 Scanner 库解析大文件的一部分,但我很难确定解析此文本的最佳路径。
SECTOR 199
FLAGS 0x1000
AMBIENT LIGHT 0.67
EXTRA LIGHT 0.00
COLORMAP 0
TINT 0.00 0.00 0.00
BOUNDBOX 7.399998 8.200002 6.199998 9.399998 8.500000 7.099998
COLLIDEBOX 7.605121 8.230770 6.200000 9.399994 8.469233 7.007693
CENTER 8.399998 8.350001 6.649998
RADIUS 1.106797
VERTICES 12
0: 1810
1: 1976
2: 1977
3: 1812
4: 1978
5: 1979
6: 1820
7: 1980
8: 1821
9: 1981
10: 1982
11: 1811
SURFACES 1893 8
它有一些可选字段(SOUND、COLLIDEBOX),因此我无法像处理文件前一部分那样按特定顺序进行解析。我不确定如何在不使其效率非常低的情况下执行此操作,目前我一直在考虑解析每一行,然后用 String.split("\s+") 分割它以获取值,但我我很好奇我还有什么其他选择。 :\
I'm trying to parse a section of a large file with Java's Scanner library, but I'm having a hard time trying to determine the best route to parse this text.
SECTOR 199
FLAGS 0x1000
AMBIENT LIGHT 0.67
EXTRA LIGHT 0.00
COLORMAP 0
TINT 0.00 0.00 0.00
BOUNDBOX 7.399998 8.200002 6.199998 9.399998 8.500000 7.099998
COLLIDEBOX 7.605121 8.230770 6.200000 9.399994 8.469233 7.007693
CENTER 8.399998 8.350001 6.649998
RADIUS 1.106797
VERTICES 12
0: 1810
1: 1976
2: 1977
3: 1812
4: 1978
5: 1979
6: 1820
7: 1980
8: 1821
9: 1981
10: 1982
11: 1811
SURFACES 1893 8
It has some optional fields that(SOUND, COLLIDEBOX), so I can't parse in a particular order like I've been doing with the previous part of the file. I'm unsure how to go about doing this without making it terribly inefficient, at the moment I've been thinking about parsing each line, then splitting it with the String.split("\s+") to get the values, but I'm curious what other options I may have. :\
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果文件很大,我建议你可以使用java.io.RandomAccessFile,它可以跳过你想要解析的任何区域,而且速度非常快。如果将整个文件映射到内存中,可能会减慢应用程序的速度。
也可以使用java.util.StringTokenizer来分割简单的大小写。例如,空格,逗号等。它比正则表达式更快。
If the file is very big,I suggest that you can use java.io.RandomAccessFile,it can skip any area that you want to parse and it's very fast. If you map whole file into memnory, it may slow down you application.
It's alternative to use java.util.StringTokenizer to split simple case.For example, white space,comma and so on. It's more faster than regular expression.
这种方法怎么样:
您必须创建一个 Command 接口:
以及您可能遇到的每种类型的命令的单独实现:
以及某种工厂来查找命令:(
此代码可能无法编译)
How about this approach:
You will have to create a Command interface:
and a separate implementation of it for each type of command you can encounter:
and of some sort of factory to find commands:
(this code may not compile)
我首先使用关键字定义一个枚举,例如:
可以逐行进行解析,在空白字符处拆分。然后,我将第一个元素转换为 Keyword 类中的枚举,并使用简单的 switch 构造来处理这些值:
I'd first define an enum with the keywords, like:
Parsing can be done line by line, splitting at whitespace chars. Then I'd convert the first element to an enum from the Keyword class and use a simple switch construct to handle the values:
输入看起来足够复杂,足以需要一个完整的解析器。我建议使用 ANTLR 等库( http://www.antlr.org/ )。
The input looks like it is complex enough to warrent an full blown parser. I would recommend to use a library such as ANTLR ( http://www.antlr.org/ ).