在Java/DOS-UNIX中获取文件格式
有没有一种简单的方法可以查看特定文件是否具有 DOS/MAC/UNIX 行结尾?
目前,我逐字节读取文件,如果看到 Windows 回车符,则停止
for (byte thisByte : bytes) {
if ((!isDos) && (thisByte == 13)) {
isDos = true;
}
...
有没有办法在不逐字节读取文件的情况下获取相同的信息?
Is there an easy way to see whether particular file has DOS/MAC/UNIX line endings?
Currently I read the file byte by byte and stop if I see Windows carriage return
for (byte thisByte : bytes) {
if ((!isDos) && (thisByte == 13)) {
isDos = true;
}
...
Is there a way to get the same information without reading file byte by byte?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一种可能的优化可能是仅查看文件的最后一个或两个字节。由于许多文本文件以一行结束,这在大多数情况下应该有效。如果你没有发现一行在那里结束,那么你将不得不回退到逐字节。
顺便说一句,您的示例代码将 isDos 设置为 true,而不检查下一个字符是否是十进制 10。如果不是 10,则可能是 MAC 文件格式。
A possible optimization might be to look only at the very final one or two bytes of the file. Since many text files terminate in a line ending this should work most of the time. If you don't spot a line ending there, then you'll have to fall back to byte-by-byte.
BTW, your example code sets isDos to true without checking if the very next character is a decimal 10. If it isn't a 10 then it's probably a MAC file format.
假设它是一个文本文件,并且行的长度“合理”,您可以读取文件的一大块(例如 4096 字节)并仅扫描该块以查找 CR 字符。
但除此之外,不,在文件中查找字符的唯一方法是实际读取整个文件并查找该字符。
假设您问这个问题是因为每次读取一个字节的文件时遇到性能问题:请确保使用
BufferedInputStream
包装FileInputStream
。Assuming that it's a text file, and the lines are "reasonable" length, you could read a large block of the file (say 4096 bytes) and scan just that block for the CR character.
But otherwise, no, the only way that you can find a character in a file is to actually read the entire file and look for the character.
On the assumption that you're asking this question because you have performance problems reading the file a byte at a time: make sure that you wrap the
FileInputStream
with aBufferedInputStream
.如果您知道某个文件仅使用一种行尾,那么您只需扫描第一个换行符并查看它是否是 DOS/UNIX/Mac。
If you know that a file only uses one sort of end-of-line, then you can just scan for the first newline and see if it's DOS/UNIX/Mac.