如何将文本文件(或 XML 文件)表示为整个字符串,并在其中搜索(或匹配)特定字符串?
我创建了一个 BufferedReader 对象:
BufferedReader input = new BufferedReader(new FileReader(aFile));
然后我尝试使用 Scanner 类及其选项来指定不同的分隔符,如下所示:
//Scanner scantext = new Scanner(input);
//Scanner scantext = new Scanner(input).useDelimiter("");
Scanner scantext = new Scanner(input).useDelimiter("\n");
while (scantext.hasNext()) { ... }
使用这样的 Scanner 类我可以逐行或逐字读取文本,但是这对我没有帮助,因为有时在我想要处理的文本中,我
</review><review>
想说:如果您在文本中的任何位置找到“
”,请执行以下操作:包含以下下一行(或一段文本)的内容,直到找到“
”。 问题是
和
位于文本中的不同位置,有时会粘在其他文本上(因此作为分隔符的空白空间不会别帮我)。
我曾想过我可能会使用Java中的正则表达式API(Pattern和Matcher类),但它们似乎匹配特定的字符串或行,并且我希望将文本作为一个连续的字符串(至少这是我的印象)根据我读到的有关它们的内容)。 你能告诉我在这种情况下我应该使用什么结构/方法/类吗? 谢谢。
How can I have a text file (or XML file) represented as a whole string, and search for (or match) a particular string in it?
I have created a BufferedReader object:
BufferedReader input = new BufferedReader(new FileReader(aFile));
and then I have tried to use the Scanner class with its option to specify different delimiters, like this:
//Scanner scantext = new Scanner(input);
//Scanner scantext = new Scanner(input).useDelimiter("");
Scanner scantext = new Scanner(input).useDelimiter("\n");
while (scantext.hasNext()) { ... }
Using the Scanner class like this I can either read the text line by line, or word by word, but it doesn't help me, because sometimes in the text, which I want to process, I have
</review><review>
and I would like to say: if you find "<review>
" anywhere in the text, do something with the following next lines (or piece of text) until you find "</review>
". The problem is that <review>
and </review>
are on different places in the text, and sometimes glued to other text (therefore the empty space as delimiter doesn't help me).
I have thought that I might use the regular expression API in Java (the Pattern and Matcher classes), but they seem to match a particular string or line, and I want to have the text as one continuous string (at least this was my impressions from what I have read about them). Could you tell me what structures/methods/classes I should use in this case? Thank you.
发布评论
评论(6)
不要尝试使用正则表达式来解析 XML; 它只会导致痛苦。 有很多
非常好现有 XML Java 中的 API 已经存在; 为什么要尝试重新发明它们?无论如何,要在文本文件中搜索字符串,您应该:
Pattern
搜索Matcher
迭代任何匹配项Don't try to parse XML with regular expressions; it leads only to pain. There are a lot of
very niceexisting XML APIs in Java already; why try to reinvent them?Anyway, to search for a string in a text file, you should:
Pattern
to search forMatcher
to iterate through any matches在我看来,好像您正在尝试使用结构化 xml 文件,并建议您查看 javax.xml.parsers.DocumentBuilder 或其他内置 API 用于解析文档。
It looks to me as though you are trying to work with a structured xml file, and would suggest that you look into javax.xml.parsers.DocumentBuilder or other built in APIs to parse the document.
使用 XML 解析器。
或者使用 xpath,如这个示例所示。
Use an XML parser.
Or use xpath, like in this example.
嗯,有什么阻止你将 XML 文件读入字符串,然后使用正则表达式 API 对其进行操作?
您可以使用例如 Commons IO 的 >FileUtils:请参阅
readFileToString(文件文件,字符串编码)
。Um, does something prevent you from reading the XML file into a String, and then operating on that, using the regular expression API?
You can easily read a file into a String using e.g. FileUtils from Apache Commons IO: see
readFileToString(File file, String encoding)
.我还建议使用 XML 解析 API...但是由于您只想在“review”标签的情况下执行某些操作,也许您可以使用 SAX 比 DOM 更好...
I also would recommend using a XML parsing API...But as you only want to do something in case of "review" tag, maybe you could use SAX better than DOM...
我认为在这里,我们可以将文本文件中的单独行复制到字符串中,然后尝试将子字符串(搜索字符串)与字符串(行)匹配,
但是在执行 / 或 # 等元字符时会产生错误。
I think here, we can copy individual line in the text file into a string and then try to match a substring(search string) with the string(line)
But error produces while excuting metacharacters like / or # etc..