如何使用正则表达式来解析Java中的文件?
我正在尝试使用一系列正则表达式来解析文件中的标记。我需要计算换行符,并能够分隔它们之间没有空格的标记。不幸的是,java.util.Scanner 的 findWithinHorizon() 方法会搜索输入流的整个其余部分(直到地平线)以查找正则表达式匹配的开始,但我想匹配从当前文件位置开始的正则表达式。具体来说,我有一堆正则表达式,想要循环遍历它们以查看哪一个从文件中的当前位置开始匹配,然后将文件位置前进到正则表达式匹配后的右侧,然后继续。这可能吗?
Scanner 的 next() 方法对此似乎没有用,因为它强制使用分隔符,并且正则表达式必须匹配整个标记;我想从当前文件位置开始匹配,获取匹配的字符串,并将文件查找前进到匹配之后。
I'm trying to use a series of regular expressions to parse tokens from a file. I need to count newlines and be able to separate tokens that don't have a space between them. Unfortunately java.util.Scanner's findWithinHorizon() method searches the entire rest of the input stream (up to horizon) for the START of the regex match, but I want to match the regex starting at the current file position. Specifically, I have a bunch of regex's and want to loop through them to see which one matches starting at the current position in the file, and then advance the file position to right after the regex match, and continue. Is this possible?
Scanner's next() method seems to be useless for this because it enforces delimiters and the regex must match the entire token; I want to match from the current file position, get the matched string, and advance the file seek to after the match.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
选项:
将整个文件作为字符串读入内存。然后直接在您想要的位置使用
Matcher
。使用从
RandomAccessFile
获取的FileChannel
作为Scanner
的输入。然后,您可以直接操纵通道的位置。如上使用
FileChannel
,但直接使用Matcher
以获得更大的灵活性。将 Matcher 与 RandomAccessFile 结合使用的示例:
Options:
Read the whole file into memory as a String. Then use
Matcher
directly at the positions you want to.Use a
FileChannel
acquired from aRandomAccessFile
as the input for theScanner
. You can then directly manipulate the position of the channel.Use a
FileChannel
as above, but useMatcher
directly for greater flexibility.An example of using a Matcher with a RandomAccessFile: