正则表达式仅匹配整个单词

发布于 2024-08-11 17:11:38 字数 250 浏览 4 评论 0原文

我有一个正则表达式,用于查找给定内容块中的所有单词(不区分大小写),这些单词包含在存储在数据库中的术语表中。这是我的模式:

/($word)/i

问题是,如果我使用 /(Foo)/i 那么像 Food 这样的词就会匹配。单词两侧需要有空格或单词边界。

当单词 Foo 位于句子的开头、中间或结尾时,如何修改表达式以仅匹配它?

I have a regex expression that I'm using to find all the words in a given block of content, case insensitive, that are contained in a glossary stored in a database. Here's my pattern:

/($word)/i

The problem is, if I use /(Foo)/i then words like Food get matched. There needs to be whitespace or a word boundary on both sides of the word.

How can I modify my expression to match only the word Foo when it is a word at the beginning, middle, or end of a sentence?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

不爱素颜 2024-08-18 17:11:38

使用单词边界:

/\b($word)\b/i

或者如果您正在搜索“SPECTRE”,如 Sinan Ünür 的示例所示:

/(?:\W|^)(\Q$word\E)(?:\W|$)/i

Use word boundaries:

/\b($word)\b/i

Or if you're searching for "S.P.E.C.T.R.E." like in Sinan Ünür's example:

/(?:\W|^)(\Q$word\E)(?:\W|$)/i
↙厌世 2024-08-18 17:11:38

要匹配任何整个单词,您可以使用模式 (\w+)

假设您使用的是 PCRE 或类似的内容:

在此处输入图像描述

上面的屏幕截图取自此实例:
https://regex101.com/r/FGheKd/1

匹配命令行上的任何整个单词与 (\w+)

我将在 phpsh 交互式 shell 上使用 < a href="http://releases.ubuntu.com/12.10/" rel="nofollow noreferrer">Ubuntu 12.10 演示 PCRE 正则表达式引擎 通过称为 preg_match 的方法

启动 phpsh ,将一些内容放入变量中,匹配单词。

el@apollo:~/foo$ phpsh

php> $content1 = 'badger'
php> $content2 = '1234'
php> $content3 = '$%^&'

php> echo preg_match('(\w+)', $content1);
1

php> echo preg_match('(\w+)', $content2);
1

php> echo preg_match('(\w+)', $content3);
0

preg_match 方法使用 PHP 语言中的 PCRE 引擎来分析变量:$content1$content2$content3 以及 ( \w)+ 模式。

$content1 和 $content2 至少包含一个单词,$content3 则不包含。

将命令行上的多个文字与 (dart|fart)

el@apollo:~/foo$ phpsh

php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'farty gun';
php> $gun4 = 'unicorn gun';

php> echo preg_match('(dart|fart)', $gun1);
1

php> echo preg_match('(dart|fart)', $gun2);
1

php> echo preg_match('(dart|fart)', $gun3);
1

php> echo preg_match('(dart|fart)', $gun4);
0

变量gun1 和gun2 匹配,其中包含字符串dart 或fart。枪4没有。然而,查找单词 fartfarty 匹配可能会出现问题。要解决此问题,请在正则表达式中强制执行单词边界。

将命令行上的文字与单词边界匹配。

el@apollo:~/foo$ phpsh

php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'farty gun';
php> $gun4 = 'unicorn gun';

php> echo preg_match('(\bdart\b|\bfart\b)', $gun1);
1

php> echo preg_match('(\bdart\b|\bfart\b)', $gun2);
1

php> echo preg_match('(\bdart\b|\bfart\b)', $gun3);
0

php> echo preg_match('(\bdart\b|\bfart\b)', $gun4);
0

因此,它与前面的示例相同,只是内容中不存在带有 \b 单词边界的单词 fartfarty

To match any whole word you would use the pattern (\w+)

Assuming you are using PCRE or something similar:

enter image description here

Above screenshot taken from this live example:
https://regex101.com/r/FGheKd/1

Matching any whole word on the commandline with (\w+)

I'll be using the phpsh interactive shell on Ubuntu 12.10 to demonstrate the PCRE regex engine through the method known as preg_match

Start phpsh, put some content into a variable, match on word.

el@apollo:~/foo$ phpsh

php> $content1 = 'badger'
php> $content2 = '1234'
php> $content3 = '$%^&'

php> echo preg_match('(\w+)', $content1);
1

php> echo preg_match('(\w+)', $content2);
1

php> echo preg_match('(\w+)', $content3);
0

The preg_match method used the PCRE engine within the PHP language to analyze variables: $content1, $content2 and $content3 with the (\w)+ pattern.

$content1 and $content2 contain at least one word, $content3 does not.

Match a number of literal words on the commandline with (dart|fart)

el@apollo:~/foo$ phpsh

php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'farty gun';
php> $gun4 = 'unicorn gun';

php> echo preg_match('(dart|fart)', $gun1);
1

php> echo preg_match('(dart|fart)', $gun2);
1

php> echo preg_match('(dart|fart)', $gun3);
1

php> echo preg_match('(dart|fart)', $gun4);
0

variables gun1 and gun2 contain the string dart or fart. gun4 does not. However it may be a problem that looking for word fart matches farty. To fix this, enforce word boundaries in regex.

Match literal words on the commandline with word boundaries.

el@apollo:~/foo$ phpsh

php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'farty gun';
php> $gun4 = 'unicorn gun';

php> echo preg_match('(\bdart\b|\bfart\b)', $gun1);
1

php> echo preg_match('(\bdart\b|\bfart\b)', $gun2);
1

php> echo preg_match('(\bdart\b|\bfart\b)', $gun3);
0

php> echo preg_match('(\bdart\b|\bfart\b)', $gun4);
0

So it's the same as the previous example except that the word fart with a \b word boundary does not exist in the content: farty.

沉鱼一梦 2024-08-18 17:11:38

使用 \b 可以产生令人惊讶的结果。您最好弄清楚是什么将单词与其定义分开,并将该信息合并到您的模式中。

#!/usr/bin/perl

use strict; use warnings;

use re 'debug';

my $str = 'S.P.E.C.T.R.E. (Special Executive for Counter-intelligence,
Terrorism, Revenge and Extortion) is a fictional global terrorist
organisation';

my $word = 'S.P.E.C.T.R.E.';

if ( $str =~ /\b(\Q$word\E)\b/ ) {
    print $1, "\n";
}

输出:

Compiling REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b"
Final program:
   1: BOUND (2)
   2: OPEN1 (4)
   4:   EXACT  (9)
   9: CLOSE1 (11)
  11: BOUND (12)
  12: END (0)
anchored "S.P.E.C.T.R.E." at 0 (checking anchored) stclass BOUND minlen 14
Guessing start of match in sv for REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P
.E.C.T.R.E. (Special Executive for Counter-intelligence,"...
Found anchored substr "S.P.E.C.T.R.E." at offset 0...
start_shift: 0 check_at: 0 s: 0 endpos: 1
Does not contradict STCLASS...
Guessed: match at offset 0
Matching REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P.E.C.T.R.E. (Special Exec
utive for Counter-intelligence,"...
   0           |  1:BOUND(2)
   0           |  2:OPEN1(4)
   0           |  4:EXACT (9)
  14      |  9:CLOSE1(11)
  14      | 11:BOUND(12)
                                  failed...
Match failed
Freeing REx: "\b(S\.P\.E\.C\.T\.R\.E\.)\b"

Using \b can yield surprising results. You would be better off figuring out what separates a word from its definition and incorporating that information into your pattern.

#!/usr/bin/perl

use strict; use warnings;

use re 'debug';

my $str = 'S.P.E.C.T.R.E. (Special Executive for Counter-intelligence,
Terrorism, Revenge and Extortion) is a fictional global terrorist
organisation';

my $word = 'S.P.E.C.T.R.E.';

if ( $str =~ /\b(\Q$word\E)\b/ ) {
    print $1, "\n";
}

Output:

Compiling REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b"
Final program:
   1: BOUND (2)
   2: OPEN1 (4)
   4:   EXACT  (9)
   9: CLOSE1 (11)
  11: BOUND (12)
  12: END (0)
anchored "S.P.E.C.T.R.E." at 0 (checking anchored) stclass BOUND minlen 14
Guessing start of match in sv for REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P
.E.C.T.R.E. (Special Executive for Counter-intelligence,"...
Found anchored substr "S.P.E.C.T.R.E." at offset 0...
start_shift: 0 check_at: 0 s: 0 endpos: 1
Does not contradict STCLASS...
Guessed: match at offset 0
Matching REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P.E.C.T.R.E. (Special Exec
utive for Counter-intelligence,"...
   0           |  1:BOUND(2)
   0           |  2:OPEN1(4)
   0           |  4:EXACT (9)
  14      |  9:CLOSE1(11)
  14      | 11:BOUND(12)
                                  failed...
Match failed
Freeing REx: "\b(S\.P\.E\.C\.T\.R\.E\.)\b"
内心激荡 2024-08-18 17:11:38

对于那些想要在代码中验证枚举的人,您可以按照指南

在正则表达式世界中,您可以使用 ^ 开始字符串,并使用 $ 结束字符串。将它们与 | 结合使用可能是您想要的:

^(Male)$|^(Female)$

它仅在 MaleFemale 情况下返回 true。

For Those who want to validate an Enum in their code you can following the guide

In Regex World you can use ^ for starting a string and $ to end it. Using them in combination with | could be what you want :

^(Male)$|^(Female)$

It will return true only for Male or Female case.

会傲 2024-08-18 17:11:38

如果您在 Notepad++ 中执行此操作,

[\w]+ 

则会为您提供整个单词,并且您可以添加括号将其作为一个组来获取。示例:conv1 = Conv2D(64, (3, 3),activation=LeakyReLU(alpha=a), padding='valid', kernel_initializer='he_normal')(inputs)。我想将 LeakyReLU 作为注释移至其自己的行中,并替换当前的激活。在记事本++中,可以使用以下查找命令来完成此操作:

([\w]+)( = .+)(LeakyReLU.alpha=a.)(.+)

并且替换命令变为:

\1\2'relu'\4 \n    # \1 = LeakyReLU\(alpha=a\)\(\1\)

空格是为了在我的代码中保持正确的格式。 :)

If you are doing it in Notepad++

[\w]+ 

Would give you the entire word, and you can add parenthesis to get it as a group. Example: conv1 = Conv2D(64, (3, 3), activation=LeakyReLU(alpha=a), padding='valid', kernel_initializer='he_normal')(inputs). I would like to move LeakyReLU into its own line as a comment, and replace the current activation. In notepad++ this can be done using the follow find command:

([\w]+)( = .+)(LeakyReLU.alpha=a.)(.+)

and the replace command becomes:

\1\2'relu'\4 \n    # \1 = LeakyReLU\(alpha=a\)\(\1\)

The spaces is to keep the right formatting in my code. :)

慕烟庭风 2024-08-18 17:11:38

使用单词边界 \b,

以下内容(使用四个转义符)适用于我的环境:Mac、safari 版本 10.0.3 (12602.4.8)

var myReg = new RegExp(‘\\\\b’+ variable + ‘\\\\b’, ‘g’)

use word boundaries \b,

The following (using four escapes) works in my environment: Mac, safari Version 10.0.3 (12602.4.8)

var myReg = new RegExp(‘\\\\b’+ variable + ‘\\\\b’, ‘g’)
北斗星光 2024-08-18 17:11:38

/(\s|^)TheWord(\s|$)/

console.log(/(\s|^)TheWord(\s|$)/i.test(' TheWord '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test('TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is TheWord '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test('this is TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test(' anything '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' anything'));
console.log(/(\s|^)TheWord(\s|$)/i.test('anything'));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is anything '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is anything'));
console.log(/(\s|^)TheWord(\s|$)/i.test('this is anything'));

/(\s|^)TheWord(\s|$)/

console.log(/(\s|^)TheWord(\s|$)/i.test(' TheWord '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test('TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is TheWord '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test('this is TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test(' anything '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' anything'));
console.log(/(\s|^)TheWord(\s|$)/i.test('anything'));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is anything '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is anything'));
console.log(/(\s|^)TheWord(\s|$)/i.test('this is anything'));

小嗷兮 2024-08-18 17:11:38

获取字符串中的所有“单词”

/([^\s]+)/g

基本上 ^/s 表示空格分隔(或匹配非空格组)
不要忘记代表贪婪的g

尝试一下:

“不是您要寻找的答案?浏览标记为正则表达式字边界的其他问题或提出您自己的问题。”.match(/ ([^\s]+)/g)

→ (17)['不是', '这个', '答案', '你', '寻找', '寻找?', '浏览”、“其他”、“问题”、“标记”、“正则表达式”、“单词边界”、“或”、“询问”、“你的”、“自己的”、“问题”。]

Get all "words" in a string

/([^\s]+)/g

Basically ^/s means break on spaces (or match groups of non-spaces)
Don't forget the g for Greedy

Try it:

"Not the answer you're looking for? Browse other questions tagged regex word-boundary or ask your own question.".match(/([^\s]+)/g)

→ (17) ['Not', 'the', 'answer', "you're", 'looking', 'for?', 'Browse', 'other', 'questions', 'tagged', 'regex', 'word-boundary', 'or', 'ask', 'your', 'own', 'question.']

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文