Java InputStream 读取方法返回 ASCII“NUL”; NFS 挂载位置中文件的字符
我有一个 Java 进程,它使用 Java RandomAccessFile 读取给定的文件,并根据文件内容进行一些处理。该文件是一个日志文件,由另一个 java 进程更新。读取该文件的 java 进程位于另一台计算机上,并且具有 NFS 挂载设置来访问远程服务器中的文件。基本上,读取文件的进程将根据文件长度和 RandomAccessFile 的位置轮询文件中的更改,并为遇到的每个字节调用处理程序方法。问题是我有时会从 RandomAccessFile 读取方法返回 ASCII“NUL”字符
int charInt = read();
,即 charInt 在某些情况下返回 0,一段时间后它会返回有效字符。但后来我在 NUL 中读取流时丢失了字符,
我尝试使用 http://commons.apache.org/io/apidocs/org/apache/commons/io/input/Tailer.html 我收到每一行的通知。但在这些行中我有时会注意到 ASCII NUL 字符。 我还经历了 unix/linux“tail 的 Java IO 实现” -f" - 我的java进程是类似的,但后来我开始认为问题出在NFS挂载或尝试从NFS挂载读取时出现一些错误的java IO。我进行了一些测试,从普通文件(不在 NFS 挂载中)读取数据,并有一个进程连续写入该文件。所有这些测试都很成功。 我还尝试了 java BufferedReader 因为文件流实际上是字符流,即使我可以将其视为字节流。我仍然收到 NUL 字符。
不确定这是否重要 - NFS 挂载是只读 (ro) 挂载。 感谢对此的任何帮助。谢谢。
我也尝试了以下操作:
FileWriter fileWriter;
try {
fileWriter = new FileWriter("<OUT_FILE>", true);
} catch (IOException e) {
throw new RuntimeException("Exception while creating file to write sent messages ", e);
}
BufferedWriter bufWriter = new BufferedWriter(fileWriter);
Runtime r = Runtime.getRuntime();
Process p = r.exec("tail -f <PATH_TO_IN_FILE>");
Scanner s = new Scanner(p.getInputStream());
while (s.hasNextLine()) {
String line = s.nextLine();
bufWriter.write(line);
bufWriter.write(System.getProperty("line.separator"));
bufWriter.flush();
}
bufWriter.close();
但我仍然收到 NUL 字符。在这里,我将读取的行写入文件,以便我可以比较 IN 文件和 OUT 文件。我有时会看到行被跳过(带有 NUL 字符)。所有其他行都比较正常 - 因此,从大约 13000 行中,我们看到大约 100 行不匹配。另外一件奇怪的事情是,我运行的次数较少,我也可以在这里看到 NUL 字符,基本上是以下形式^C^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^@ 然后是有效行。在丢失行期间我注意到的另一件事是,文件在写入过程中更新得非常快,因此基本上一条 xml 消息在 20110729 13:44:06.070097 写入文件,然后下一条消息在 20110729 13 :44:06.100007。第二条 xml 消息中遗漏了几行。更多发现:我们读取文件的文件路径位于共享 NAS 中。
I have a Java process which reads a given file using the Java RandomAccessFile and does some processing based on the file contents. This file is a log file which gets updated by another java process. The java process which reads the file is on another machine and has a NFS mount setup to access the file in the remote server. Basically the process which reads the file will poll for changes in the file based on the file length and position of the RandomAccessFile and call a handlers method for each byte encountered. The issue is that i am sometimes getting ASCII 'NUL' characters returned from RandomAccessFile read method
int charInt = read();
that is, charInt returning 0 on some occasions and after some time it returns valid characters. But then i am missing the characters during the stream is reading in NULs
I tried using http://commons.apache.org/io/apidocs/org/apache/commons/io/input/Tailer.html where i get notified of each line. but then in these lines i sometimes notice the ASCII NUL characters.
I have also gone thru trail in Java IO implementation of unix/linux "tail -f"
- my java process is something similar, but then i am starting to think the issue is with the NFS mount or some buggy java IO when trying to read from a NFS mount. I carried out some testing reading from a normal file (which is not in a NFS mount) and having a process which continuously writes to it. All these tests were succesfull.
I also tried java BufferedReader since the file stream is really a character stream even though i can treat it as a byte stream. Still i am getting the NUL characters.
not sure whether this will matter - the NFS mount is a readonly (ro) one.
Appreciate any help on this. Thanks.
I tried the following as well:
FileWriter fileWriter;
try {
fileWriter = new FileWriter("<OUT_FILE>", true);
} catch (IOException e) {
throw new RuntimeException("Exception while creating file to write sent messages ", e);
}
BufferedWriter bufWriter = new BufferedWriter(fileWriter);
Runtime r = Runtime.getRuntime();
Process p = r.exec("tail -f <PATH_TO_IN_FILE>");
Scanner s = new Scanner(p.getInputStream());
while (s.hasNextLine()) {
String line = s.nextLine();
bufWriter.write(line);
bufWriter.write(System.getProperty("line.separator"));
bufWriter.flush();
}
bufWriter.close();
and still i am getting the NUL characters. Here i am writing the lines i read to a file so that then i can compare the the IN file and the OUT file. I see on one occassions lines are skipped (with NUL characters). all other lines compare fine - so from about 13000 lines, we see a mismatch in about 100 lines. Also another strange thing is that I had a less running and i can see the NUL characters here as well ,, there are basically in the form of ^C^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
and then valid lines. one more thing i noticed during the time the lines were missed , the file was getting updated very quickly by the writing process, so basically an xml message was written to the file at 20110729 13:44:06.070097 and then the next one at 20110729 13:44:06.100007. lines were missed from this second xml message. more findings : the file path where we are reading the files off are in a shared NAS.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我意识到这个问题已经存在一年多了,但我会添加我所知道的内容,以防其他有这个问题的人像我一样偶然发现它。
此问题中描述的 NUL 字符是由于对正在读取的文件进行异步写入而出现的。更具体地说,来自远程文件写入器的数据包已无序到达,并且 NAS 缓冲区已提交较晚的数据包并用 NUL 字符填充未接收数据的区域。当收到丢失的数据包时,NAS 缓冲区会提交该数据包,并覆盖那些空字符。
在我们第一次遇到这种情况的应用程序中,我们正在逐行读取文件,并跟踪成功读取的最后一个行号(这样我们就可以随时停止并从中断处重新启动)。我们处理这个问题的临时解决方案只是在每次读取时专门检查“\0”,当遇到它时,关闭文件,等待 1 秒钟并重新打开文件,排队到我们上次停下的地方。通常,当我们再次读取该行时,实际文本已被提交。
虽然关闭并重新打开文件可能看起来很戏剧性,但不这样做进行恢复是有问题的。您无法标记/重置 BufferedReader 来解决此问题,因为一旦将字符读入阅读器的缓冲区,它们就不会从文件中重新读取,只会在您每次尝试再次读取时重新读取。
获取底层 FileChannel 以及读取和设置 position() 也会失败,因为文件中的位置包含读入缓冲区的字符,而您可能还没有看到,并且最终将跳过那些看不见的数据。
我们正在测试一个解决方案,我们扩展了 InputStreamReader 类并覆盖 read(char[], int, int) 方法以使用文件通道在每次读取之前获取位置,调用超类的 read 方法,检查 \0 并重置如果找到,则返回文件通道位置,返回 0 作为读取的字符数。
I realize this question is now more than a year old, but I will add what I know to it, in case others with this issue stumble across it as I have.
The NUL characters described in this question appear due to asynchronous writes to the file being read from. More specifically, packets of data from the remote file writer have arrived out of order, and the NAS buffer has committed a later packet and padded the area for the unreceived data with NUL characters. When the missing packet is received, the NAS buffer commits it, overwriting those null characters.
In the application where we first encountered this, we are reading a file line by line, and keeping track of the last line number successfully read (so we can stop at any time and start up again where we left off). Our interim solution for handling this is simply to check specifically for the "\0" on every read and, when it is encountered, close the file, wait 1 second and reopen the file, queuing up to where we left off. Usually, by the time we read the line again, the actual text has been committed.
While closing and reopening the file may seem dramatic, recovering without doing this is problematic. You can't Mark/Reset the BufferedReader to resolve it, because once characters are read into the reader's Buffer they will not be reread from the file, only regurgitated every time you try and read again.
Getting the underlying FileChannel, and reading and setting position() also fails because your position in the file includes characters read into the buffer that you may not have seen yet, and you will end up skipping that unseen data.
We are testing a solution where we have extended the InputStreamReader class and overwritten the read(char[], int, int) method to use the filechannel to get the position before each read, call the superclass's read method, check for \0 and reset the filechannel position if it is found, returning 0 as the number of characters read.
您是否尝试过这样的操作:
如果无法从文件中读取任何内容,currentLine 将为 null ...
我怀疑存在特定的 NFS + Java 问题,虚拟机应该不知道您通过 NFS 访问文件的事实。
Did you try something like this:
If nothing can be read from the file currentLine will be null ...
I doubt there is a specific NFS + Java problem, the fact that you access a file via NFS should be unknown to the VM.