正则表达式仅匹配整个单词
我有一个正则表达式,用于查找给定内容块中的所有单词(不区分大小写),这些单词包含在存储在数据库中的术语表中。这是我的模式:
/($word)/i
问题是,如果我使用 /(Foo)/i
那么像 Food
这样的词就会匹配。单词两侧需要有空格或单词边界。
当单词 Foo
位于句子的开头、中间或结尾时,如何修改表达式以仅匹配它?
I have a regex expression that I'm using to find all the words in a given block of content, case insensitive, that are contained in a glossary stored in a database. Here's my pattern:
/($word)/i
The problem is, if I use /(Foo)/i
then words like Food
get matched. There needs to be whitespace or a word boundary on both sides of the word.
How can I modify my expression to match only the word Foo
when it is a word at the beginning, middle, or end of a sentence?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
使用单词边界:
或者如果您正在搜索“SPECTRE”,如 Sinan Ünür 的示例所示:
Use word boundaries:
Or if you're searching for "S.P.E.C.T.R.E." like in Sinan Ünür's example:
要匹配任何整个单词,您可以使用模式
(\w+)
假设您使用的是 PCRE 或类似的内容:
上面的屏幕截图取自此实例:
https://regex101.com/r/FGheKd/1
匹配命令行上的任何整个单词与
(\w+)
我将在 phpsh 交互式 shell 上使用 < a href="http://releases.ubuntu.com/12.10/" rel="nofollow noreferrer">Ubuntu 12.10 演示 PCRE 正则表达式引擎 通过称为 preg_match 的方法
启动 phpsh ,将一些内容放入变量中,匹配单词。
preg_match 方法使用 PHP 语言中的 PCRE 引擎来分析变量:
$content1
、$content2
和$content3
以及( \w)+
模式。$content1 和 $content2 至少包含一个单词,$content3 则不包含。
将命令行上的多个文字与
(dart|fart)
变量gun1 和gun2 匹配,其中包含字符串dart 或fart。枪4没有。然而,查找单词
fart
与farty
匹配可能会出现问题。要解决此问题,请在正则表达式中强制执行单词边界。将命令行上的文字与单词边界匹配。
因此,它与前面的示例相同,只是内容中不存在带有
\b
单词边界的单词fart
:farty
。To match any whole word you would use the pattern
(\w+)
Assuming you are using PCRE or something similar:
Above screenshot taken from this live example:
https://regex101.com/r/FGheKd/1
Matching any whole word on the commandline with
(\w+)
I'll be using the phpsh interactive shell on Ubuntu 12.10 to demonstrate the PCRE regex engine through the method known as preg_match
Start phpsh, put some content into a variable, match on word.
The preg_match method used the PCRE engine within the PHP language to analyze variables:
$content1
,$content2
and$content3
with the(\w)+
pattern.$content1 and $content2 contain at least one word, $content3 does not.
Match a number of literal words on the commandline with
(dart|fart)
variables gun1 and gun2 contain the string dart or fart. gun4 does not. However it may be a problem that looking for word
fart
matchesfarty
. To fix this, enforce word boundaries in regex.Match literal words on the commandline with word boundaries.
So it's the same as the previous example except that the word
fart
with a\b
word boundary does not exist in the content:farty
.使用
\b
可以产生令人惊讶的结果。您最好弄清楚是什么将单词与其定义分开,并将该信息合并到您的模式中。输出:
Using
\b
can yield surprising results. You would be better off figuring out what separates a word from its definition and incorporating that information into your pattern.Output:
对于那些想要在代码中验证枚举的人,您可以按照指南
在正则表达式世界中,您可以使用
^
开始字符串,并使用$
结束字符串。将它们与|
结合使用可能是您想要的:它仅在
Male
或Female
情况下返回 true。For Those who want to validate an Enum in their code you can following the guide
In Regex World you can use
^
for starting a string and$
to end it. Using them in combination with|
could be what you want :It will return true only for
Male
orFemale
case.如果您在 Notepad++ 中执行此操作,
则会为您提供整个单词,并且您可以添加括号将其作为一个组来获取。示例:
conv1 = Conv2D(64, (3, 3),activation=LeakyReLU(alpha=a), padding='valid', kernel_initializer='he_normal')(inputs)
。我想将LeakyReLU
作为注释移至其自己的行中,并替换当前的激活。在记事本++中,可以使用以下查找命令来完成此操作:并且替换命令变为:
空格是为了在我的代码中保持正确的格式。 :)
If you are doing it in Notepad++
Would give you the entire word, and you can add parenthesis to get it as a group. Example:
conv1 = Conv2D(64, (3, 3), activation=LeakyReLU(alpha=a), padding='valid', kernel_initializer='he_normal')(inputs)
. I would like to moveLeakyReLU
into its own line as a comment, and replace the current activation. In notepad++ this can be done using the follow find command:and the replace command becomes:
The spaces is to keep the right formatting in my code. :)
使用单词边界 \b,
以下内容(使用四个转义符)适用于我的环境:Mac、safari 版本 10.0.3 (12602.4.8)
use word boundaries \b,
The following (using four escapes) works in my environment: Mac, safari Version 10.0.3 (12602.4.8)
/(\s|^)TheWord(\s|$)/
/(\s|^)TheWord(\s|$)/
获取字符串中的所有“单词”
/([^\s]+)/g
尝试一下:
“不是您要寻找的答案?浏览标记为正则表达式字边界的其他问题或提出您自己的问题。”.match(/ ([^\s]+)/g)
→ (17)['不是', '这个', '答案', '你', '寻找', '寻找?', '浏览”、“其他”、“问题”、“标记”、“正则表达式”、“单词边界”、“或”、“询问”、“你的”、“自己的”、“问题”。]
Get all "words" in a string
/([^\s]+)/g
Try it:
"Not the answer you're looking for? Browse other questions tagged regex word-boundary or ask your own question.".match(/([^\s]+)/g)
→ (17) ['Not', 'the', 'answer', "you're", 'looking', 'for?', 'Browse', 'other', 'questions', 'tagged', 'regex', 'word-boundary', 'or', 'ask', 'your', 'own', 'question.']