在 Java 中将文本文件表示为单个单元,并匹配文本中的字符串

发布于 2024-07-18 02:38:31 字数 911 浏览 5 评论 0 原文

如何将文本文件(或 XML 文件)表示为整个字符串,并在其中搜索(或匹配)特定字符串?

我创建了一个 BufferedReader 对象:

BufferedReader input =  new BufferedReader(new FileReader(aFile));

然后我尝试使用 Scanner 类及其选项来指定不同的分隔符,如下所示:

//Scanner scantext = new Scanner(input);
//Scanner scantext = new Scanner(input).useDelimiter("");
Scanner scantext = new Scanner(input).useDelimiter("\n");
while (scantext.hasNext()) {  ... }

使用这样的 Scanner 类我可以逐行或逐字读取文本,但是这对我没有帮助,因为有时在我想要处理的文本中,我

</review><review>

想说:如果您在文本中的任何位置找到“”,请执行以下操作:包含以下下一行(或一段文本)的内容,直到找到“”。 问题是 位于文本中的不同位置,有时会粘在其他文本上(因此作为分隔符的空白空间不会别帮我)。

我曾想过我可能会使用Java中的正则表达式API(Pattern和Matcher类),但它们似乎匹配特定的字符串或行,并且我希望将文本作为一个连续的字符串(至少这是我的印象)根据我读到的有关它们的内容)。 你能告诉我在这种情况下我应该使用什么结构/方法/类吗? 谢谢。

How can I have a text file (or XML file) represented as a whole string, and search for (or match) a particular string in it?

I have created a BufferedReader object:

BufferedReader input =  new BufferedReader(new FileReader(aFile));

and then I have tried to use the Scanner class with its option to specify different delimiters, like this:

//Scanner scantext = new Scanner(input);
//Scanner scantext = new Scanner(input).useDelimiter("");
Scanner scantext = new Scanner(input).useDelimiter("\n");
while (scantext.hasNext()) {  ... }

Using the Scanner class like this I can either read the text line by line, or word by word, but it doesn't help me, because sometimes in the text, which I want to process, I have

</review><review>

and I would like to say: if you find "<review>" anywhere in the text, do something with the following next lines (or piece of text) until you find "</review>". The problem is that <review> and </review> are on different places in the text, and sometimes glued to other text (therefore the empty space as delimiter doesn't help me).

I have thought that I might use the regular expression API in Java (the Pattern and Matcher classes), but they seem to match a particular string or line, and I want to have the text as one continuous string (at least this was my impressions from what I have read about them). Could you tell me what structures/methods/classes I should use in this case? Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

森林迷了鹿 2024-07-25 02:38:31

不要尝试使用正则表达式来解析 XML; 它只会导致痛苦。 有很多非常好 现有 XML Java 中的 API 已经存在; 为什么要尝试重新发明它们?

无论如何,要在文本文件中搜索字符串,您应该:

  1. 将文件作为字符串加载 (示例)
  2. 创建一个 Pattern 搜索
  3. 使用 Matcher 迭代任何匹配项

Don't try to parse XML with regular expressions; it leads only to pain. There are a lot of very nice existing XML APIs in Java already; why try to reinvent them?

Anyway, to search for a string in a text file, you should:

  1. Load the file as a string (example)
  2. Create a Pattern to search for
  3. Use a Matcher to iterate through any matches
似梦非梦 2024-07-25 02:38:31

在我看来,好像您正在尝试使用结构化 xml 文件,并建议您查看 javax.xml.parsers.DocumentBuilder 或其他内置 API 用于解析文档。

It looks to me as though you are trying to work with a structured xml file, and would suggest that you look into javax.xml.parsers.DocumentBuilder or other built in APIs to parse the document.

野却迷人 2024-07-25 02:38:31

使用 XML 解析器。

或者使用 xpath,如这个示例所示。

Use an XML parser.

Or use xpath, like in this example.

ζ澈沫 2024-07-25 02:38:31

我曾想过我可能会使用Java中的正则表达式API(Pattern和Matcher类),但它们似乎匹配特定的字符串或行,并且我希望将文本作为一个连续的字符串

嗯,有什么阻止你将 XML 文件读入字符串,然后使用正则表达式 API 对其进行操作?

您可以使用例如 Commons IO 的 >FileUtils:请参阅 readFileToString(文件文件,字符串编码)

I have thought that I might use the regular expression API in Java (the Pattern and Matcher classes), but they seem to match a particular string or line, and I want to have the text as one continuous string

Um, does something prevent you from reading the XML file into a String, and then operating on that, using the regular expression API?

You can easily read a file into a String using e.g. FileUtils from Apache Commons IO: see readFileToString(File file, String encoding).

烟火散人牵绊 2024-07-25 02:38:31

我还建议使用 XML 解析 API...但是由于您只想在“review”标签的情况下执行某些操作,也许您可​​以使用 SAX 比 DOM 更好...

I also would recommend using a XML parsing API...But as you only want to do something in case of "review" tag, maybe you could use SAX better than DOM...

迟到的我 2024-07-25 02:38:31

我认为在这里,我们可以将文本文件中的单独行复制到字符串中,然后尝试将子字符串(搜索字符串)与字符串(行)匹配,

但是在执行 / 或 # 等元字符时会产生错误。

I think here, we can copy individual line in the text file into a string and then try to match a substring(search string) with the string(line)

But error produces while excuting metacharacters like / or # etc..

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文