C# - 正则表达式匹配整个单词
我需要匹配包含给定字符串的所有整个单词。
string s = "ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
MatchCollection mc = r.Matches(s);
我需要的结果是:
MYTESTING
YOUTESTED
TESTING
但我得到:
TESTING
TESTED
.TESTING
如何使用正则表达式实现此目的。
编辑:扩展示例字符串。
I need to match all the whole words containing a given a string.
string s = "ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
MatchCollection mc = r.Matches(s);
I need the result to be:
MYTESTING
YOUTESTED
TESTING
But I get:
TESTING
TESTED
.TESTING
How do I achieve this with Regular expressions.
Edit: Extended sample string.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您正在查找包括“TEST”在内的所有单词,则应该使用
\w 包含单词字符并且是 [A-Za-z0-9_] 的缩写
If you were looking for all words including 'TEST', you should use
\w includes word characters and is short for [A-Za-z0-9_]
保持简单:为什么不尝试使用
\w*TEST\w*
作为匹配模式。Keep it simple: why not just try
\w*TEST\w*
as the match pattern.我得到了您所期望的结果:
I get the results you are expecting with the following:
尝试使用
\b
。它是非单词分隔符的正则表达式标志。如果您想匹配这两个单词,您可以使用:顺便说一句,.net 不需要周围的
/
,而i
只是一个不区分大小写的匹配标志。.NET 替代方案:
Try using
\b
. It's the regex flag for a non-word delimiter. If you wanted to match both words you could use:BTW, .net doesn't need the surrounding
/
, and thei
is just a case-insensitive match flag..NET Alternative:
使用组我认为你可以实现它。
完全按照您想要的方式工作。
顺便说一句,您的正则表达式模式应该是逐字字符串(@“”)
Using Groups I think you can achieve it.
Works exactly like you want.
BTW, your regular expression pattern should be a verbatim string ( @"")
首先,正如 @manojlds 所说,您应该尽可能使用正则表达式的逐字字符串。否则,您必须在大多数正则表达式转义序列中使用两个反斜杠,而不仅仅是一个(例如
[!\\..]*
)。其次,如果您想匹配除点之外的任何内容,则正则表达式的该部分应为
[^.]*
。^
是反转字符类的元字符,而不是!
,并且.
在该上下文中没有特殊含义,因此不需要被逃脱。但您可能应该使用\w*
代替,甚至[AZ]*
,具体取决于您所说的“单词”的确切含义。[!\..]
匹配!
或.
。这样你就不需要担心单词边界,尽管它们不会造成伤害:
最后,如果你总是获取整个匹配,则不需要使用捕获组:
匹配的文本将可用通过 Match 的
Value
属性。First, as @manojlds said, you should use verbatim strings for regexes whenever possible. Otherwise you'll have to use two backslashes in most of your regex escape sequences, not just one (e.g.
[!\\..]*
).Second, if you want to match anything but a dot, that part of the regex should be
[^.]*
.^
is the metacharacter that inverts the character class, not!
, and.
has no special meaning in that context, so it doesn't need to be escaped. But you should probably use\w*
instead, or even[A-Z]*
, depending on what exactly you mean by "word".[!\..]
matches!
or.
.That way you don't need to bother with word boundaries, though they don't hurt:
Finally, if you're always taking the whole match anyway, you don't need to use a capturing group:
The matched text will be available via Match's
Value
property.