FileReader 与 FileInputReader。分割与模式
我正在处理一个大约 2G 的文件。我想逐行阅读文件以查找一些特定术语。 我可以更好地使用哪个类:FileReader 还是 FileInputStream? 以及如何有效地找到特定的单词。我只是使用 split() 方法,但也许我可以将 java.util.regex.Pattern 类与 java.util.regex.Matcher 类结合使用。
所以问题是: 我可以使用哪个类:FileReader 还是 FileInputStream? 我可以使用 split 方法或正则表达式类
有人可以回答这个问题吗?谢谢。
I'm working with a file with about 2G. I want to read the file line by line to find some specific terms.
Whitch class can I better use: FileReader or FileInputStream?
And how can I find the specific words efficiently. I'm just using the split() method, but may be can I use the java.util.regex.Pattern class in combination with java.util.regex.Matcher class.
So the Questions are:
which class can I use: the FileReader or the FileInputStream?
can I use the split method or the regex classes
Does someone has an answer to this questions? Thans.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您需要使用 Reader (可能包含在 BufferedReader),因为您使用的是字符串数据,而不是二进制数据。您应该预编译您的模式(
Pattern.compile
)。除此之外,从您的描述中不清楚您是否应该使用Pattern.split
,或者如果使用 Matcher 会更合适。请注意
str.split(regex, limit)
相当于Pattern.compile(regex).split(str, limit)
You'll want to use a Reader (probably wrapped in a BufferedReader), since you're working with String data, as opposed to binary. You should pre-compile your pattern (
Pattern.compile
). Beyond that, it's unclear from your description if you should usePattern.split
, or if using a Matcher would be more appropriate.Note that
str.split(regex, limit)
is equivalent toPattern.compile(regex).split(str, limit)
最好的选择是使用
BufferedReader
(因为它的readLine()
方法)包装InputStreamReader
(因为它能够指定编码)包装FileInputStream
(用于实际读取文件):FileReader
使用平台默认编码,这通常是一个坏主意,使得该类对于不熟悉文件的开发人员来说主要是一个陷阱意识到潜在的问题。如果您只想查找行中的子字符串,
String.indexOf()
是最有效的方法;如果您实际上正在寻找特定模式,那么使用正则表达式会更好。The best option would be to use a
BufferedReader
(for itsreadLine()
method) wrapping anInputStreamReader
(for its ability to specify the encoding) wrapping aFileInputStream
(for actually reading the file):FileReader
uses the platform default encoding, which is usually a bad idea, making the class mainly a trap for developers who are not aware of the potential for problems.If you just want to find substrings in the lines,
String.indexOf()
is the most efficient way; using regexes is better if you're actually looking for specific patterns.BufferedReader 有一个
readLine ()
方法,可用于逐行读取。Reader
(和Writer
)类可用于字符串数据,其中InputStream
(和OutputStream
)应该用于用于二进制数据(字节数组)。The BufferedReader has a
readLine()
method that can be used for reading line by line. TheReader
(andWriter
) classes can be used for String data, where theInputStream
(andOutputStream
) should be used for binary data (byte arrays).