Java RandomAccessFile - 处理不同的换行符样式?
我正在尝试通过 RandomAccessFile 进行查找,作为算法的一部分,我必须读取一行,然后从该行的末尾向后查找,
例如,
String line = raf.readLine();
raf.seek (raf.getFilePointer() - line.length() + m.start() + m.group().length());
//m is a Matcher for regular expressions
我已经收到了大量的离一错误,但无法不明白为什么。我刚刚发现这是因为我正在读取的某些文件具有 UNIX 风格的换行符 \r\n,而有些文件只有 Windows 风格的 \n。
是否有一个简单的方法可以让 RandomAccessFile 将所有换行符视为 Windows 风格的换行符?
I'm trying to seek through a RandomAccessFile, and as part of an algorithm I have to read a line, and then seek backwards from the end of the line
E.g
String line = raf.readLine();
raf.seek (raf.getFilePointer() - line.length() + m.start() + m.group().length());
//m is a Matcher for regular expressions
I've been getting loads of off-by-one errors and couldn't figure out why. I just discovered it's because some files I'm reading from have UNIX-style linefeeds, \r\n, and some have just windows-style \n.
Is there an easy to have the RandomAccessFile treat all linefeeds as windows-style linefeeds?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您始终可以将流备份两个字节并重新读取它们以查看它是 \r \n 还是 (!\r)\n:
我不确定您要放置文件指针的确切位置,因此请调整2/1 常数适当。如果文件中出现空行 (\n\n),您可能还需要添加额外的检查,就好像它显示您可能会陷入无限循环而没有代码来跳过它一样。
You could always back the stream up two bytes and re-read them to see if it is \r \n or (!\r)\n:
I'm not sure exactly where you are trying to place the file pointer, so adjust the 2/1 constants appropriately. You may also need to add an extra check for blank lines (\n\n) if they occur in your file, as if it shows up you might get stuck in an infinite loop without code to step past it.
不。RandomAccessFile 和相关抽象(包括底层文件系统)将文件模型化为可索引的字节序列。他们既不知道也不关心线路或线路终端。
您需要做的是记录行开始的实际位置,而不是根据行终止序列的假设来尝试找出它们的位置。或者,使用行读取器捕获其读取的每行的行终止序列,作为行的一部分或在读取每个输入行后可以访问的属性中。
或者,在打开文件进行随机访问之前,将所有文件转换为使用 DOS 行终止序列。
No. RandomAccessFile and related abstractions (including the underlying file systems) model files as an indexable sequence of bytes. They neither know or care about lines or line terminations.
What you need to do is record the actual positions of line starts rather than trying to figure out where they are based on assumptions about what the line termination sequence is. Alternatively, use an line reader that captures the line termination sequence for each line that it reads, either as part of the line or in an attribute that can be accessed after reading each input line.
Alternatively, convert all the files to use DOS line termination sequences before you open them for random access.