Java多行正则表达式替换文件中的多个实例

发布于 2024-10-22 04:23:32 字数 987 浏览 6 评论 0原文

好吧，我已经搜索了几个小时来解决我的问题，但似乎没有任何结果。所以这是我的代码片段，后面是问题：

Pattern forKeys = Pattern.compile("^<feature>\\s*<name>Deviation</name>.*?</feature>", Pattern.DOTALL|Pattern.MULTILINE);
Matcher n = forKeys.matcher("");
String aLine = null;
    while((aLine = in.readLine()) != null) {
         n.reset(aLine);
         String result = n.replaceAll("");
         out.write(result);
         out.newLine();
    }

让我们假设未声明的变量已经声明了。

我的观点是，我的正则表达式（也许还有匹配器）无法正常工作。

我想删除 ff 行中包含的“Deviation*这里的任何字符*”的部分：

<feature>
    <name>Deviation</name>
            <more words here>
</feature>
<feature>
    <name>Average</name>
</feature>
    <feature>
    <name>Deviation</name>
            sample words
</feature>

我认为我的问题是使用重复运算符（如何遍历换行符、制表符等），但我似乎找不到正确的表达式。

有什么想法吗？提前致谢。

原文

OK, so I have been searching for hours about my problem but nothing seems to come up.
So here's my code snippet followed by the problem:

Pattern forKeys = Pattern.compile("^<feature>\\s*<name>Deviation</name>.*?</feature>", Pattern.DOTALL|Pattern.MULTILINE);
Matcher n = forKeys.matcher("");
String aLine = null;
    while((aLine = in.readLine()) != null) {
         n.reset(aLine);
         String result = n.replaceAll("");
         out.write(result);
         out.newLine();
    }

let's just assume the undeclared variables are already declared..

my point is, my RegEx (and maybe the matcher also) is not working properly.

I want to erase the parts with the "<feature><name>Deviation</name>*any character/s here*</feature>" included in the ff lines:

<feature>
    <name>Deviation</name>
            <more words here>
</feature>
<feature>
    <name>Average</name>
</feature>
    <feature>
    <name>Deviation</name>
            sample words
</feature>

I think my problem is the use of repititive operators (how to traverse line breaks, tabs, etc), but I can't seem to find the correct expression.

Any ideas? Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝色星空 2024-10-29 04:23:32

使用正则表达式解析 HTML 或 XML是邪恶且容易出错。

使用 XML 解析器，事情会好得多。
这是使用 Dom4J 解决您问题的方法：

// parse XML source
Document document = DocumentHelper.parseText(yourXmlText);

Iterator<Element> featureIterator =
    // get an iterator for all <feature> elements
    document.getRootElement().elementIterator("feature");

while(featureIterator.hasNext()){
    Element featureElement = featureIterator.next();
    // if <feature> has a child <name> with Content "Deviation"
    if("Deviation").equals(featureElement.elementTextTrim("name")){
        // remove this <feature> element
        featureIterator.remove();
    }
}

// write modified XML back to file
new XMLWriter(
    new FileOutputStream(yourXmlFile), OutputFormat.createPrettyPrint()
).write(document);

除此之外，您还犯了一个错误（请参阅我的评论）：

// aLine is just a single line
while((aLine = in.readLine()) != null) {
     n.reset(aLine);
     // yet you want to replace a multi-line pattern
     String result = n.replaceAll("");
     out.write(result);
     out.newLine();
}

如果您将整个文件读取到字符串中，您的正则表达式可能会或可能不会工作，但如果您将其应用于单独的行，则它无法工作。

Parsing HTML or XML with regex is evil and error-prone.

Use an XML parser and things will work much better.
Here's a solution for your problem using Dom4J:

// parse XML source
Document document = DocumentHelper.parseText(yourXmlText);

Iterator<Element> featureIterator =
    // get an iterator for all <feature> elements
    document.getRootElement().elementIterator("feature");

while(featureIterator.hasNext()){
    Element featureElement = featureIterator.next();
    // if <feature> has a child <name> with Content "Deviation"
    if("Deviation").equals(featureElement.elementTextTrim("name")){
        // remove this <feature> element
        featureIterator.remove();
    }
}

// write modified XML back to file
new XMLWriter(
    new FileOutputStream(yourXmlFile), OutputFormat.createPrettyPrint()
).write(document);

Apart from that you are also making a mistake (see my comments):

// aLine is just a single line
while((aLine = in.readLine()) != null) {
     n.reset(aLine);
     // yet you want to replace a multi-line pattern
     String result = n.replaceAll("");
     out.write(result);
     out.newLine();
}

Your regex might or might not work if you read the entire file to a String, but it can't work if you apply it on individual lines.

回复收藏 0 原文

~没有更多了~