Java中的非贪婪正则表达式
我有下一个代码:
public static void createTokens(){
String test = "test is a word word word word big small";
Matcher mtch = Pattern.compile("test is a (\\s*.+?\\s*) word (\\s*.+?\\s*)").matcher(test);
while (mtch.find()){
for (int i = 1; i <= mtch.groupCount(); i++){
System.out.println(mtch.group(i));
}
}
}
并有下一个输出:
word
w
但在我看来,它一定是:
word
word
有人请解释一下为什么会这样?
I have next code:
public static void createTokens(){
String test = "test is a word word word word big small";
Matcher mtch = Pattern.compile("test is a (\\s*.+?\\s*) word (\\s*.+?\\s*)").matcher(test);
while (mtch.find()){
for (int i = 1; i <= mtch.groupCount(); i++){
System.out.println(mtch.group(i));
}
}
}
And have next output:
word
w
But in my opinion it must be:
word
word
Somebody please explain me why so?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
因为您的模式是非贪婪的,所以它们匹配尽可能少的文本,同时仍然包含匹配项。
删除?在第二组中,你会得到
词
词 词 大 小
Because your patterns are non-greedy, so they matched as little text as possible while still consisting of a match.
Remove the ? in the second group, and you'll get
word
word word big small
通过使用
\\s*
它将匹配任意数量的空格,包括 0 个空格。w
匹配(\\s*.+?\\s*)
。要确保它与以空格分隔的单词匹配,请尝试(\\s+.+?\\s+)
By using
\\s*
it will match any number of spaces including 0 spaces.w
matches(\\s*.+?\\s*)
. To make sure it matches a word separated by spaces try(\\s+.+?\\s+)