使用 Java 扫描器读取文件

发布于 2024-09-26 01:03:19 字数 251 浏览 9 评论 0原文

我试图理解的 java 文件中的一行如下所示。

return new Scanner(file).useDelimiter("\\Z").next();

根据 java.util.regex.Pattern 文档，该文件预计将返回“输入结束但最终终止符（如果有）”。但实际情况是它仅返回文件中的前 1024 个字符。这是正则表达式模式匹配器施加的限制吗？这可以克服吗？目前我正在使用文件阅读器。但我想知道这种行为的原因。

原文

One of the lines in a java file I'm trying to understand is as below.

return new Scanner(file).useDelimiter("\\Z").next();

The file is expected to return upto "The end of the input but for the final terminator, if any" as per java.util.regex.Pattern documentation. But what happens is it returns only the first 1024 characters from the file. Is this a limitation imposed by the regex Pattern matcher? Can this be overcome? Currently I'm going ahead using a filereader. But I would like to know the reason for this behaviour.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

陌上芳菲 2024-10-03 01:03:19

我自己无法重现这一点。但我想我可以阐明正在发生的事情。

在内部，扫描仪使用 1024 个字符的字符缓冲区。如果可能，扫描仪将默认读取您的 Readable 1024 个字符，然后应用该模式。

问题出在你的模式中......它总是与输入的结尾匹配，但这并不意味着你的输入流/数据的结尾。当 Java 将模式应用于缓冲数据时，它会尝试查找输入结尾的第一次出现。由于缓冲区中有 1024 个字符，匹配引擎将位置 1024 称为分隔符的第一个匹配项，并将其之前的所有内容作为第一个标记返回。

由于这个原因，我认为输入结束锚点在扫描仪中使用无效。毕竟，它可以从无限流中读取。

回复收藏 0 原文

半步萧音过轻尘 2024-10-03 01:03:19

尝试将 file 对象包装在 FileInputStream 中

回复收藏 0 原文

禾厶谷欠 2024-10-03 01:03:19

Scanner 旨在从文件中读取多个基元。它实际上并不是要读取整个文件。

如果您不想包含第三方库，则最好循环遍历包装 FileReader/InputStreamReader 的 BufferedReader 以获取文本，或循环遍历 FileInputStream 来获取二进制数据。

如果您可以使用第三方库，Apache commons-io 有一个 FileUtils 类包含静态方法 readFileToString 和 readLines 用于文本和 readFileToByteArray 用于二进制数据..

回复收藏 0 原文

七堇年 2024-10-03 01:03:19

您可以使用 Scanner 类，只需在打开扫描仪时指定一个字符集，即：

Scanner sc = new Scanner(file, "ISO-8859-1");

Java 使用指定的字符集将从文件读取的字节转换为字符，如果没有给出任何内容，则这是默认的字符集（来自底层操作系统）（< a href="http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html#Scanner%28java.io.File,%20java.lang.String%29" rel ="nofollow">来源）。我仍然不清楚为什么 Scanner 在默认情况下仅读取 1024 字节，而在另一种情况下它会到达文件末尾。不管怎样，效果很好！

You can use the Scanner class, just specify a char-set when opening the scanner, i.e.:

Scanner sc = new Scanner(file, "ISO-8859-1");

Java converts bytes read from the file into characters using the specified charset, which is the default one (from underlying OS) if nothing is given (source). It is still not clear to me why Scanner reads only 1024 bytes with the default one, whilst with another one it reaches the end of a file. Anyway, it works fine!

回复收藏 0 原文

~没有更多了~