正则表达式仅匹配整个单词

发布于 2024-08-11 17:11:38 字数 250 浏览 4 评论 0原文

我有一个正则表达式，用于查找给定内容块中的所有单词（不区分大小写），这些单词包含在存储在数据库中的术语表中。这是我的模式：

/($word)/i

问题是，如果我使用 /(Foo)/i 那么像 Food 这样的词就会匹配。单词两侧需要有空格或单词边界。

当单词 Foo 位于句子的开头、中间或结尾时，如何修改表达式以仅匹配它？

原文

I have a regex expression that I'm using to find all the words in a given block of content, case insensitive, that are contained in a glossary stored in a database. Here's my pattern:

/($word)/i

The problem is, if I use /(Foo)/i then words like Food get matched. There needs to be whitespace or a word boundary on both sides of the word.

How can I modify my expression to match only the word Foo when it is a word at the beginning, middle, or end of a sentence?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不爱素颜 2024-08-18 17:11:38

使用单词边界：

/\b($word)\b/i

或者如果您正在搜索“SPECTRE”，如 Sinan Ünür 的示例所示：

/(?:\W|^)(\Q$word\E)(?:\W|$)/i

Use word boundaries:

/\b($word)\b/i

Or if you're searching for "S.P.E.C.T.R.E." like in Sinan Ünür's example:

/(?:\W|^)(\Q$word\E)(?:\W|$)/i

回复收藏 0 原文

↙厌世 2024-08-18 17:11:38

要匹配任何整个单词，您可以使用模式 `(\w+)`

假设您使用的是 PCRE 或类似的内容：

在此处输入图像描述

上面的屏幕截图取自此实例：
https://regex101.com/r/FGheKd/1

匹配命令行上的任何整个单词与 `(\w+)`

我将在 phpsh 交互式 shell 上使用 < a href="http://releases.ubuntu.com/12.10/" rel="nofollow noreferrer">Ubuntu 12.10 演示 PCRE 正则表达式引擎通过称为 preg_match 的方法

启动 phpsh ，将一些内容放入变量中，匹配单词。

el@apollo:~/foo$ phpsh

php> $content1 = 'badger'
php> $content2 = '1234'
php> $content3 = '$%^&'

php> echo preg_match('(\w+)', $content1);
1

php> echo preg_match('(\w+)', $content2);
1

php> echo preg_match('(\w+)', $content3);
0

preg_match 方法使用 PHP 语言中的 PCRE 引擎来分析变量：$content1、$content2 和 $content3 以及 ( \w)+ 模式。

$content1 和 $content2 至少包含一个单词，$content3 则不包含。

将命令行上的多个文字与 `(dart|fart)`

el@apollo:~/foo$ phpsh

php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'farty gun';
php> $gun4 = 'unicorn gun';

php> echo preg_match('(dart|fart)', $gun1);
1

php> echo preg_match('(dart|fart)', $gun2);
1

php> echo preg_match('(dart|fart)', $gun3);
1

php> echo preg_match('(dart|fart)', $gun4);
0

变量gun1 和gun2 匹配，其中包含字符串dart 或fart。枪4没有。然而，查找单词 fart 与 farty 匹配可能会出现问题。要解决此问题，请在正则表达式中强制执行单词边界。

将命令行上的文字与单词边界匹配。

el@apollo:~/foo$ phpsh

php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'farty gun';
php> $gun4 = 'unicorn gun';

php> echo preg_match('(\bdart\b|\bfart\b)', $gun1);
1

php> echo preg_match('(\bdart\b|\bfart\b)', $gun2);
1

php> echo preg_match('(\bdart\b|\bfart\b)', $gun3);
0

php> echo preg_match('(\bdart\b|\bfart\b)', $gun4);
0

因此，它与前面的示例相同，只是内容中不存在带有 \b 单词边界的单词 fart：farty。

To match any whole word you would use the pattern `(\w+)`

Assuming you are using PCRE or something similar:

enter image description here

Above screenshot taken from this live example:
https://regex101.com/r/FGheKd/1

Matching any whole word on the commandline with `(\w+)`

I'll be using the phpsh interactive shell on Ubuntu 12.10 to demonstrate the PCRE regex engine through the method known as preg_match

Start phpsh, put some content into a variable, match on word.

el@apollo:~/foo$ phpsh

php> $content1 = 'badger'
php> $content2 = '1234'
php> $content3 = '$%^&'

php> echo preg_match('(\w+)', $content1);
1

php> echo preg_match('(\w+)', $content2);
1

php> echo preg_match('(\w+)', $content3);
0

The preg_match method used the PCRE engine within the PHP language to analyze variables: $content1, $content2 and $content3 with the (\w)+ pattern.

$content1 and $content2 contain at least one word, $content3 does not.

Match a number of literal words on the commandline with `(dart|fart)`

el@apollo:~/foo$ phpsh

php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'farty gun';
php> $gun4 = 'unicorn gun';

php> echo preg_match('(dart|fart)', $gun1);
1

php> echo preg_match('(dart|fart)', $gun2);
1

php> echo preg_match('(dart|fart)', $gun3);
1

php> echo preg_match('(dart|fart)', $gun4);
0

variables gun1 and gun2 contain the string dart or fart. gun4 does not. However it may be a problem that looking for word fart matches farty. To fix this, enforce word boundaries in regex.

Match literal words on the commandline with word boundaries.

el@apollo:~/foo$ phpsh

php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'farty gun';
php> $gun4 = 'unicorn gun';

php> echo preg_match('(\bdart\b|\bfart\b)', $gun1);
1

php> echo preg_match('(\bdart\b|\bfart\b)', $gun2);
1

php> echo preg_match('(\bdart\b|\bfart\b)', $gun3);
0

php> echo preg_match('(\bdart\b|\bfart\b)', $gun4);
0

So it's the same as the previous example except that the word fart with a \b word boundary does not exist in the content: farty.

回复收藏 0 原文

沉鱼一梦 2024-08-18 17:11:38

使用 \b 可以产生令人惊讶的结果。您最好弄清楚是什么将单词与其定义分开，并将该信息合并到您的模式中。

#!/usr/bin/perl

use strict; use warnings;

use re 'debug';

my $str = 'S.P.E.C.T.R.E. (Special Executive for Counter-intelligence,
Terrorism, Revenge and Extortion) is a fictional global terrorist
organisation';

my $word = 'S.P.E.C.T.R.E.';

if ( $str =~ /\b(\Q$word\E)\b/ ) {
    print $1, "\n";
}

输出：

Compiling REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b"
Final program:
   1: BOUND (2)
   2: OPEN1 (4)
   4:   EXACT  (9)
   9: CLOSE1 (11)
  11: BOUND (12)
  12: END (0)
anchored "S.P.E.C.T.R.E." at 0 (checking anchored) stclass BOUND minlen 14
Guessing start of match in sv for REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P
.E.C.T.R.E. (Special Executive for Counter-intelligence,"...
Found anchored substr "S.P.E.C.T.R.E." at offset 0...
start_shift: 0 check_at: 0 s: 0 endpos: 1
Does not contradict STCLASS...
Guessed: match at offset 0
Matching REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P.E.C.T.R.E. (Special Exec
utive for Counter-intelligence,"...
   0           |  1:BOUND(2)
   0           |  2:OPEN1(4)
   0           |  4:EXACT (9)
  14      |  9:CLOSE1(11)
  14      | 11:BOUND(12)
                                  failed...
Match failed
Freeing REx: "\b(S\.P\.E\.C\.T\.R\.E\.)\b"

Using \b can yield surprising results. You would be better off figuring out what separates a word from its definition and incorporating that information into your pattern.

#!/usr/bin/perl

use strict; use warnings;

use re 'debug';

my $str = 'S.P.E.C.T.R.E. (Special Executive for Counter-intelligence,
Terrorism, Revenge and Extortion) is a fictional global terrorist
organisation';

my $word = 'S.P.E.C.T.R.E.';

if ( $str =~ /\b(\Q$word\E)\b/ ) {
    print $1, "\n";
}

Output:

Compiling REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b"
Final program:
   1: BOUND (2)
   2: OPEN1 (4)
   4:   EXACT  (9)
   9: CLOSE1 (11)
  11: BOUND (12)
  12: END (0)
anchored "S.P.E.C.T.R.E." at 0 (checking anchored) stclass BOUND minlen 14
Guessing start of match in sv for REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P
.E.C.T.R.E. (Special Executive for Counter-intelligence,"...
Found anchored substr "S.P.E.C.T.R.E." at offset 0...
start_shift: 0 check_at: 0 s: 0 endpos: 1
Does not contradict STCLASS...
Guessed: match at offset 0
Matching REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P.E.C.T.R.E. (Special Exec
utive for Counter-intelligence,"...
   0           |  1:BOUND(2)
   0           |  2:OPEN1(4)
   0           |  4:EXACT (9)
  14      |  9:CLOSE1(11)
  14      | 11:BOUND(12)
                                  failed...
Match failed
Freeing REx: "\b(S\.P\.E\.C\.T\.R\.E\.)\b"

回复收藏 0 原文

内心激荡 2024-08-18 17:11:38

对于那些想要在代码中验证枚举的人，您可以按照指南

在正则表达式世界中，您可以使用 ^ 开始字符串，并使用 $ 结束字符串。将它们与 | 结合使用可能是您想要的：

^(Male)$|^(Female)$

它仅在 Male 或 Female 情况下返回 true。

For Those who want to validate an Enum in their code you can following the guide

In Regex World you can use ^ for starting a string and $ to end it. Using them in combination with | could be what you want :

^(Male)$|^(Female)$

It will return true only for Male or Female case.

回复收藏 0 原文

会傲 2024-08-18 17:11:38

如果您在 Notepad++ 中执行此操作，

[\w]+

则会为您提供整个单词，并且您可以添加括号将其作为一个组来获取。示例：conv1 = Conv2D(64, (3, 3),activation=LeakyReLU(alpha=a), padding='valid', kernel_initializer='he_normal')(inputs)。我想将 LeakyReLU 作为注释移至其自己的行中，并替换当前的激活。在记事本++中，可以使用以下查找命令来完成此操作：

([\w]+)( = .+)(LeakyReLU.alpha=a.)(.+)

并且替换命令变为：

\1\2'relu'\4 \n    # \1 = LeakyReLU\(alpha=a\)\(\1\)

空格是为了在我的代码中保持正确的格式。 :)

If you are doing it in Notepad++

[\w]+

Would give you the entire word, and you can add parenthesis to get it as a group. Example: conv1 = Conv2D(64, (3, 3), activation=LeakyReLU(alpha=a), padding='valid', kernel_initializer='he_normal')(inputs). I would like to move LeakyReLU into its own line as a comment, and replace the current activation. In notepad++ this can be done using the follow find command:

([\w]+)( = .+)(LeakyReLU.alpha=a.)(.+)

and the replace command becomes:

\1\2'relu'\4 \n    # \1 = LeakyReLU\(alpha=a\)\(\1\)

The spaces is to keep the right formatting in my code. :)

回复收藏 0 原文

慕烟庭风 2024-08-18 17:11:38

使用单词边界 \b，

以下内容（使用四个转义符）适用于我的环境：Mac、safari 版本 10.0.3 (12602.4.8)

var myReg = new RegExp(‘\\\\b’+ variable + ‘\\\\b’, ‘g’)

use word boundaries \b,

The following (using four escapes) works in my environment: Mac, safari Version 10.0.3 (12602.4.8)

var myReg = new RegExp(‘\\\\b’+ variable + ‘\\\\b’, ‘g’)

回复收藏 0 原文

北斗星光 2024-08-18 17:11:38

/(\s|^)TheWord(\s|$)/

console.log(/(\s|^)TheWord(\s|$)/i.test(' TheWord '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test('TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is TheWord '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test('this is TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test(' anything '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' anything'));
console.log(/(\s|^)TheWord(\s|$)/i.test('anything'));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is anything '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is anything'));
console.log(/(\s|^)TheWord(\s|$)/i.test('this is anything'));

/(\s|^)TheWord(\s|$)/

console.log(/(\s|^)TheWord(\s|$)/i.test(' TheWord '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test('TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is TheWord '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test('this is TheWord'));
console.log(/(\s|^)TheWord(\s|$)/i.test(' anything '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' anything'));
console.log(/(\s|^)TheWord(\s|$)/i.test('anything'));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is anything '));
console.log(/(\s|^)TheWord(\s|$)/i.test(' this is anything'));
console.log(/(\s|^)TheWord(\s|$)/i.test('this is anything'));

回复收藏 0 原文

小嗷兮 2024-08-18 17:11:38

获取字符串中的所有“单词”

/([^\s]+)/g

基本上 ^/s 表示空格分隔（或匹配非空格组）
不要忘记代表贪婪的g

尝试一下：

“不是您要寻找的答案？浏览标记为正则表达式字边界的其他问题或提出您自己的问题。”.match(/ ([^\s]+)/g)

→ (17)['不是', '这个', '答案', '你', '寻找', '寻找？', '浏览”、“其他”、“问题”、“标记”、“正则表达式”、“单词边界”、“或”、“询问”、“你的”、“自己的”、“问题”。]

回复收藏 0 原文

~没有更多了~

关于作者

与君绝

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

正则表达式仅匹配整个单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

要匹配任何整个单词，您可以使用模式 `(\w+)`

匹配命令行上的任何整个单词与 `(\w+)`

将命令行上的多个文字与 `(dart|fart)`

将命令行上的文字与单词边界匹配。

To match any whole word you would use the pattern `(\w+)`

Matching any whole word on the commandline with `(\w+)`

Match a number of literal words on the commandline with `(dart|fart)`

Match literal words on the commandline with word boundaries.

尝试一下：

Try it:

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

正则表达式仅匹配整个单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

要匹配任何整个单词，您可以使用模式 (\w+)

匹配命令行上的任何整个单词与 (\w+)

将命令行上的多个文字与 (dart|fart)

将命令行上的文字与单词边界匹配。

To match any whole word you would use the pattern (\w+)

Matching any whole word on the commandline with (\w+)

Match a number of literal words on the commandline with (dart|fart)

Match literal words on the commandline with word boundaries.

尝试一下：

Try it:

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

要匹配任何整个单词，您可以使用模式 `(\w+)`

匹配命令行上的任何整个单词与 `(\w+)`

将命令行上的多个文字与 `(dart|fart)`

To match any whole word you would use the pattern `(\w+)`

Matching any whole word on the commandline with `(\w+)`

Match a number of literal words on the commandline with `(dart|fart)`