使用 java.util.Scanner 逐字节读取文件
我正在尝试使用 java.util.Scanner 逐字符读取单行文件。但是我遇到了这个异常”:
Exception in thread "main" java.util.InputMismatchException: For input string: "contents of my file"
at java.util.Scanner.nextByte(Scanner.java:1861)
at java.util.Scanner.nextByte(Scanner.java:1814)
at p008.main(p008.java:18) <-- line where I do scanner.nextByte()
这是我的代码:
public static void main(String[] args) throws FileNotFoundException {
File source = new File("file.txt");
Scanner scanner = new Scanner(source);
while(scanner.hasNext()) {
System.out.println((char)scanner.nextByte());
}
scanner.close()
}
有人对我可能做错了什么有任何想法吗?
编辑:我意识到我写了 hasNext() 而不是 hasNextByte()。但是,如果我这样做,它就不会不打印任何东西。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
你到底为什么要使用扫描仪逐字节读取文件?这就像使用独轮车来运输你的零钱。 (如果您确实需要一辆独轮手推车来换零钱,请告诉我,这样我就可以成为您的朋友)。
但说真的:类
InputStream
从文件中读取字节,简单而可靠,并且不执行任何其他操作。scanner
类最近被引入到 Java API 中,因此教科书示例可以从文件中提取数据,并且比使用new BufferedReader(new InputStream) 级联通常涉及的痛苦更少。 >。它的特点是从自由格式的输入文件中输入数字和字符串。
nextByte()
方法实际上从输入流中读取一个或几个十进制数字(如果存在),并将扫描到的数字转换为单个字节值。如果您正在读取字节,为什么要将它们输出为
char
?字节不是字符,强力相互转换在某些地方会失败。如果您想查看这些字节的值,请按原样打印出来,您将看到 0 到 255 之间的小整数。如果您想从文件中读取
char
,FileReader
是适合您的类。Why on earth would you want to use a scanner to read a file byte by byte? That's like using a wheelbarrow to transport your pocket change. (If you really need a wheelbarrow for your pocket change, let me know so I can become your friend).
But seriously: Class
InputStream
reads bytes from a file, simply and reliably, and does nothing else.Class
scanner
was recently introduced into the Java API so textbook examples could pull data out of a file with less pain than is usually involved with using the cascade ofnew BufferedReader(new InputStream)
. Its specialty is inputting numbers and strings from free-form input files. ThenextByte()
method actually reads one or a few decimal digits from the input stream (if they're there) and converts the number thus scanned into a single byte value.And if you're reading bytes, why do you want to output them as
char
s? Bytes are not chars, and brute-force interconverting will fail in some places. If you want to see the values of those bytes, print them out as they are and you'll see small integers between 0 and 255.If you want to read
char
s from a file,FileReader
is the class for you.Scanner 用于解析文本数据 - 其
nextByte()
方法期望输入由数字组成(前面可能有一个符号)。您可能想使用
FileReader
如果您实际上正在读取文本数据,或者
>FileInputStream
如果它是二进制数据。或者包装在 FileInputStreamInputStreamReader
如果您正在读取具有特定字符编码的文本(不幸的是,FileReader
不允许您指定编码,而是隐式使用平台默认编码,这通常不好)。Scanner is for parsing text data - its
nextByte()
method expects the input to consist of digits (possibly preceded by a sign).You probably want to use a
FileReader
if you're actually reading text data, or aFileInputStream
if it's binary data. Or aFileInputStream
wrapped in anInputStreamReader
if you're reading text with a specific character encoding (unfortunately,FileReader
does not allow you to specify the encoding but uses the platform default encoding implicitly, which is often not good).对
Scanner
进行故障排除时,请检查 潜在的 I/O 错误:虽然我和其他人一样 - 这可能不是适合这项工作的类。如果您需要字节输入,请使用
InputStream
(在本例中为FileInputStream
)。如果您想要字符输入,请使用Reader
(例如InputStreamReader
)。When troubleshooting
Scanner
, check for underlying I/O errors:Though I'm with the others - this probably isn't the right class for the job. If you want byte input, use an
InputStream
(in this case,FileInputStream
). If you want char input, use aReader
(e.g.InputStreamReader
).Scanner
是关于读取分隔文本的(请参阅 文档)。nextByte
将继续读取,直到到达您指定的分隔符(默认为空格),然后尝试将该字符串转换为字节。因此,如果文件中有
123 456
,对nextByte
的一次调用将返回123
,而不是49
(1
字符的十进制值)。如果您想逐字节读取,可以使用
FileInputStream
。Scanner
is all about reading delimited text (see the docs).nextByte
will keep reading until it gets to whichever delimiter you specified (whitespace by default) and then try to convert that string into a byte.So if you have
123 456
in a file, one call tonextByte
will return123
, not49
(the decimal value for the1
character).If you want to read byte-by-byte, you could use
FileInputStream
.让我来回答一下为什么你想强制获取一个字节的问题。假设我正在尝试解析这一行:
“(将这个字面意思另一个)”
左括号和右括号并不是真正的分隔符,它们在 '(' 和 "lit..." 之间可能有也可能没有空格分隔符,如果你尝试使用 hasNext() 获取,你会得到 "(literalize ... “。我认为我们需要强制获取“(”,然后获取“literalize”,但我不知道该怎么做。
Let me just address the question about why you would want to force fetch one byte. Suppose I am trying to parse this line:
" (literalize this that another) "
The open and close parentheses are not really delimiters, and they may or may not have a white space delimiter between '(' and "lit..." If you try to fetch with hasNext(), you get "(literalize ... ". I think we need to force fetch "(" and then fetch "literalize" but I don't know how to do that.