使用正则表达式保存子字符串
我对 Java(或任何语言)中的正则表达式很陌生,我想使用它们进行查找。我不明白如何做的棘手部分是替换匹配字符串中的某些内容。
例如,如果我正在寻找的行是
Person item6 [can {item thing [wrap]}]
我能够编写一个找到该行的正则表达式,但是找到单词“thing”是什么(因为它在不同的行中可能有所不同)是我的问题。我可能想用其他单词替换该单词,或者将其保存在变量中供以后使用。有没有简单的方法可以使用 Java 的正则表达式引擎来做到这一点?
I'm new to regular expressions in Java (or any language, for that matter) and I'm wanting to do a find using them. The tricky part that I don't understand how to do is replace something inside the string that matches.
For example, if the line I'm looking for is
Person item6 [can {item thing [wrap]}]
I'm able to write a regex that finds that line, but finding what the word "thing" is (as it may differ among different lines) is my problem. I may want to either replace that word with something else or save it in a variable for later. Is there any easy way to do this using Java's regex engine?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的。您将其包装在“捕获组”中,这只是正则表达式中与感兴趣的单词匹配的部分周围的一些 ( ) 。
这是一个示例:
Pattern 和 Matcher 来自 java.util.regex。 String 类中有一些快捷方式,但这些是最灵活的
Yes. You wrap it in "capturing groups", which is just some ( ) around the part of the regular expression matching the interesting word.
Here is an example:
Pattern and Matcher come from java.util.regex. There are some shortcuts in the String class, but these are the most flexible
问题规范不是很清楚,但这里有一些可能有效的想法:
使用lookarounds和
replaceAll/First
以下正则表达式与前面带有
\w+
的字符串"{item "
,后跟字符串" ["
。环视仅用于精确匹配\w+
。元字符{
和[
会根据需要进行转义。这将打印:
参考文献
String.replaceAll(String regex, String replacement)
使用捕获组而不是环视
环视应谨慎使用。尤其是在 Java 中,Lookbehind 非常有限。更常用的技术是使用捕获组来匹配更多而不仅仅是有趣的部分。
以下正则表达式与之前的类似模式
\w+
匹配,但还包含"{item "
前缀和" ["
后缀。此外,item
中的m
可以无限制地重复(这在 Java 的 Lookbehind 中是无法匹配的)。打印结果:
我们的模式有 3 个捕获组:
请注意,我们不能简单地替换与
"STUFF"
匹配的内容,因为我们匹配了一些“无关”的部分。我们对替换它们不感兴趣,因此我们捕获这些部分并将它们放回替换字符串中。我们在 Java 中引用替换字符串中捕获的组的方式是使用$
符号;因此,上面示例中的$1
和$3
。参考文献
使用
Matcher
为了获得更大的灵活性并非所有事情都可以通过替换字符串来完成。例如,Java 没有后处理功能来将捕获的字符串大写。在这些更一般的替换场景中,您可以使用
Matcher
循环,如下所示:上面的打印内容:
参考资料
java.util.regex.Pattern
java.util.regex.Matcher
appendReplacement
-- 不幸的是,StringBuffer
-onlyjava.util.Formatter< /code>
- 用于上面示例中的
printf
和String.format
附件
The problem specification isn't very clear, but here are some ideas that may work:
Use lookarounds and
replaceAll/First
The following regex matches the
\w+
that is preceded by the string"{item "
and followed by the string" ["
. Lookarounds are used to match exactly the\w+
only. Metacharacters{
and[
are escaped as necessary.This prints:
References
String.replaceAll(String regex, String replacement)
Use capturing groups instead of lookarounds
Lookarounds should be used judiciously. Lookbehinds in particular in Java is very limited. A more commonly applied technique is to use capturing groups to match more than just the interesting parts.
The following regex matches a similar pattern from before,
\w+
, but also includes the"{item "
prefix and" ["
suffix. Additionally, them
initem
can repeat without limitation (something that can't be matched in a lookbehind in Java).This prints:
Our pattern has 3 capturing groups:
Note that we can't simply replace what we matched with
"STUFF"
, because we match some "extraneous" parts. We're not interested in replacing them, so we capture these parts and just put them back in the replacement string. The way we refer to what a group captured in replacement strings in Java is to use the$
sigil; thus the$1
and$3
in the above example.References
Use a
Matcher
for more flexibilityNot everything can be done with replacement strings. Java doesn't have postprocessing to capitalize a captured string, for example. In these more general replacement scenarios, you can use a
Matcher
loop like the following:The above prints:
References
java.util.regex.Pattern
java.util.regex.Matcher
group(int)
- access individual captured stringsappendReplacement
-- unfortunately,StringBuffer
-onlyjava.util.Formatter
- used inprintf
andString.format
in above exampleAttachments