字边界正则表达式问题(重叠)

发布于 2024-09-28 12:04:57 字数 393 浏览 0 评论 0原文

给出以下代码:

var myList = new List<string> { "red shirt", "blue", "green", "red" };
Regex r = new Regex("\\b(" + string.Join("|", myList.ToArray()) + ")\\b");
MatchCollection m = r.Matches("Alfred has a red shirt and blue tie");

我希望 m 的结果包括 "redshirt", "blue", "red" 因为所有这些都在字符串中,但我只是获取“红色衬衫”,“蓝色”。我可以做什么来包含重叠部分?

Given the following code:

var myList = new List<string> { "red shirt", "blue", "green", "red" };
Regex r = new Regex("\\b(" + string.Join("|", myList.ToArray()) + ")\\b");
MatchCollection m = r.Matches("Alfred has a red shirt and blue tie");

I want the result of m to include "red shirt", "blue", "red" since all those are in the string but I am only getting "red shirt", "blue". What can I do to include overlaps?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

活雷疯 2024-10-05 12:04:57

在我看来,一旦找到第一个有效匹配,正则表达式解析器就会删除匹配字符串。我现在没有 Windows 编译器设置,所以我无法进行同类比较,但我在 perl 中看到了类似的结果。

我认为你的正则表达式在加入后看起来会像这样。

'\b(redshirt|blue|green|red)\b'

测试这个正则表达式,我看到与“redshirt”、“blue”相同的结果。
将“redshirt”移动到正则表达式列表的末尾。

'\b(red|blue|green|redshirt)\b'

我现在看到“red”,“blue”。

通过将正则表达式更改为更复杂的方法,您也许能够实现您想要的结果。

\b(蓝色|绿色|(红色)衬衫)\b

这应该匹配红色作为其自己的子组和红色衬衫作为一个组。

返回“redshirt”、“red”、“blue”

如果您将有许多单词组需要多个匹配(例如红色),则更简单的方法是循环遍历字符串列表并一次匹配 1 个和红色衬衫。

由于执行正则表达式的方法有很多,我可能缺少一个明显而优雅的解决方案。

It seems to me that the regexp parser is removing the match string as soon as the first valid match is found. I don't have a windows compiler setup right now so I can't give an apples to apples comparison but I see similar results in perl.

I think your regex would look something like this after being joined.

'\b(red shirt|blue|green|red)\b'

Testing this regexp out I see the same result as "red shirt", "blue".
By moving "red shirt" to the end of the regexp list.

'\b(red|blue|green|red shirt)\b'

I now see "red" , "blue".

By altering the regexp to a little bit of a more complicated approach you might be able to achieve the results you want.

\b(blue|green|(red) shirt)\b

This should match red as its own subgroup and red shirt as a group as well.

Returns "red shirt", "red", "blue"

The simpler way to do it would be to loop through your List of strings and match 1 at a time if you are going to have many word groups that will need multiple matches like red and red shirt.

Since there are so many ways to do regexp, I am probably missing an obvious and elegant solution.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文