Java多行正则表达式替换文件中的多个实例

发布于 2024-10-22 04:23:32 字数 987 浏览 4 评论 0原文

好吧,我已经搜索了几个小时来解决我的问题,但似乎没有任何结果。 所以这是我的代码片段,后面是问题:

Pattern forKeys = Pattern.compile("^<feature>\\s*<name>Deviation</name>.*?</feature>", Pattern.DOTALL|Pattern.MULTILINE);
Matcher n = forKeys.matcher("");
String aLine = null;
    while((aLine = in.readLine()) != null) {
         n.reset(aLine);
         String result = n.replaceAll("");
         out.write(result);
         out.newLine();
    }

让我们假设未声明的变量已经声明了。

我的观点是,我的正则表达式(也许还有匹配器)无法正常工作。

我想删除 ff 行中包含的“Deviation*这里的任何字符*”的部分:

<feature>
    <name>Deviation</name>
            <more words here>
</feature>
<feature>
    <name>Average</name>
</feature>
    <feature>
    <name>Deviation</name>
            sample words
</feature>

我认为我的问题是使用重复运算符(如何遍历换行符、制表符等),但我似乎找不到正确的表达式。

有什么想法吗?提前致谢。

OK, so I have been searching for hours about my problem but nothing seems to come up.
So here's my code snippet followed by the problem:

Pattern forKeys = Pattern.compile("^<feature>\\s*<name>Deviation</name>.*?</feature>", Pattern.DOTALL|Pattern.MULTILINE);
Matcher n = forKeys.matcher("");
String aLine = null;
    while((aLine = in.readLine()) != null) {
         n.reset(aLine);
         String result = n.replaceAll("");
         out.write(result);
         out.newLine();
    }

let's just assume the undeclared variables are already declared..

my point is, my RegEx (and maybe the matcher also) is not working properly.

I want to erase the parts with the "<feature><name>Deviation</name>*any character/s here*</feature>" included in the ff lines:

<feature>
    <name>Deviation</name>
            <more words here>
</feature>
<feature>
    <name>Average</name>
</feature>
    <feature>
    <name>Deviation</name>
            sample words
</feature>

I think my problem is the use of repititive operators (how to traverse line breaks, tabs, etc), but I can't seem to find the correct expression.

Any ideas? Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

蓝色星空 2024-10-29 04:23:32

使用正则表达式解析 HTML 或 XML邪恶且容易出错

使用 XML 解析器,事情会好得多。
这是使用 Dom4J 解决您问题的方法:

// parse XML source
Document document = DocumentHelper.parseText(yourXmlText);

Iterator<Element> featureIterator =
    // get an iterator for all <feature> elements
    document.getRootElement().elementIterator("feature");

while(featureIterator.hasNext()){
    Element featureElement = featureIterator.next();
    // if <feature> has a child <name> with Content "Deviation"
    if("Deviation").equals(featureElement.elementTextTrim("name")){
        // remove this <feature> element
        featureIterator.remove();
    }
}

// write modified XML back to file
new XMLWriter(
    new FileOutputStream(yourXmlFile), OutputFormat.createPrettyPrint()
).write(document);

除此之外,您还犯了一个错误(请参阅我的评论) :

// aLine is just a single line
while((aLine = in.readLine()) != null) {
     n.reset(aLine);
     // yet you want to replace a multi-line pattern
     String result = n.replaceAll("");
     out.write(result);
     out.newLine();
}

如果您将整个文件读取到字符串中,您的正则表达式可能会或可能不会工作,但如果您将其应用于单独的行,则它无法工作。

Parsing HTML or XML with regex is evil and error-prone.

Use an XML parser and things will work much better.
Here's a solution for your problem using Dom4J:

// parse XML source
Document document = DocumentHelper.parseText(yourXmlText);

Iterator<Element> featureIterator =
    // get an iterator for all <feature> elements
    document.getRootElement().elementIterator("feature");

while(featureIterator.hasNext()){
    Element featureElement = featureIterator.next();
    // if <feature> has a child <name> with Content "Deviation"
    if("Deviation").equals(featureElement.elementTextTrim("name")){
        // remove this <feature> element
        featureIterator.remove();
    }
}

// write modified XML back to file
new XMLWriter(
    new FileOutputStream(yourXmlFile), OutputFormat.createPrettyPrint()
).write(document);

Apart from that you are also making a mistake (see my comments):

// aLine is just a single line
while((aLine = in.readLine()) != null) {
     n.reset(aLine);
     // yet you want to replace a multi-line pattern
     String result = n.replaceAll("");
     out.write(result);
     out.newLine();
}

Your regex might or might not work if you read the entire file to a String, but it can't work if you apply it on individual lines.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文