字边界正则表达式问题(重叠)
给出以下代码:
var myList = new List<string> { "red shirt", "blue", "green", "red" };
Regex r = new Regex("\\b(" + string.Join("|", myList.ToArray()) + ")\\b");
MatchCollection m = r.Matches("Alfred has a red shirt and blue tie");
我希望 m
的结果包括 "redshirt", "blue", "red"
因为所有这些都在字符串中,但我只是获取“红色衬衫”,“蓝色”
。我可以做什么来包含重叠部分?
Given the following code:
var myList = new List<string> { "red shirt", "blue", "green", "red" };
Regex r = new Regex("\\b(" + string.Join("|", myList.ToArray()) + ")\\b");
MatchCollection m = r.Matches("Alfred has a red shirt and blue tie");
I want the result of m
to include "red shirt", "blue", "red"
since all those are in the string but I am only getting "red shirt", "blue"
. What can I do to include overlaps?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在我看来,一旦找到第一个有效匹配,正则表达式解析器就会删除匹配字符串。我现在没有 Windows 编译器设置,所以我无法进行同类比较,但我在 perl 中看到了类似的结果。
我认为你的正则表达式在加入后看起来会像这样。
'\b(redshirt|blue|green|red)\b'
测试这个正则表达式,我看到与“redshirt”、“blue”相同的结果。
将“redshirt”移动到正则表达式列表的末尾。
'\b(red|blue|green|redshirt)\b'
我现在看到“red”,“blue”。
通过将正则表达式更改为更复杂的方法,您也许能够实现您想要的结果。
\b(蓝色|绿色|(红色)衬衫)\b
这应该匹配红色作为其自己的子组和红色衬衫作为一个组。
返回“redshirt”、“red”、“blue”
如果您将有许多单词组需要多个匹配(例如红色),则更简单的方法是循环遍历字符串列表并一次匹配 1 个和红色衬衫。
由于执行正则表达式的方法有很多,我可能缺少一个明显而优雅的解决方案。
It seems to me that the regexp parser is removing the match string as soon as the first valid match is found. I don't have a windows compiler setup right now so I can't give an apples to apples comparison but I see similar results in perl.
I think your regex would look something like this after being joined.
'\b(red shirt|blue|green|red)\b'
Testing this regexp out I see the same result as "red shirt", "blue".
By moving "red shirt" to the end of the regexp list.
'\b(red|blue|green|red shirt)\b'
I now see "red" , "blue".
By altering the regexp to a little bit of a more complicated approach you might be able to achieve the results you want.
\b(blue|green|(red) shirt)\b
This should match red as its own subgroup and red shirt as a group as well.
Returns "red shirt", "red", "blue"
The simpler way to do it would be to loop through your List of strings and match 1 at a time if you are going to have many word groups that will need multiple matches like red and red shirt.
Since there are so many ways to do regexp, I am probably missing an obvious and elegant solution.