有没有办法匹配句子中的所有相邻单词?

发布于 2025-02-06 13:35:45 字数 533 浏览 3 评论 0原文

my $line = "The quick brown fox jumps over the lazy dog.";

while ($line){
    $line =~ s/["",]//ig; #[] means to get rid of 
    #print $line
    $line = lc($line); #lc is lowercase
        while ($line=~m/\b(\w+\s\w+)\b/ig){ #[^ ] means any character except spaces and newline #($line=~m/\b(\s\w+\s\w+)\b/ig)
        my $word =$1;
        print "$word\n";
        $wordcount{$word} += 1;
         
    }
last;

}
close(INPUT);
close(OUTPUT);

想要的排名将是:快速,快速的棕色,棕色的狐狸,狐狸跳。但是,对于上面的代码,我只得到快速,棕色的狐狸,跳过...。

my $line = "The quick brown fox jumps over the lazy dog.";

while ($line){
    $line =~ s/["",]//ig; #[] means to get rid of 
    #print $line
    $line = lc($line); #lc is lowercase
        while ($line=~m/\b(\w+\s\w+)\b/ig){ #[^ ] means any character except spaces and newline #($line=~m/\b(\s\w+\s\w+)\b/ig)
        my $word =$1;
        print "$word\n";
        $wordcount{$word} += 1;
         
    }
last;

}
close(INPUT);
close(OUTPUT);

Desired out put will be: the quick, quick brown, brown fox, fox jumps.... However, for the code above I am only getting the quick, brown fox, jumps over....

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

悟红尘 2025-02-13 13:35:45

可以使用a

use warnings;
use strict;
use feature 'say';

my $string = shift // 'The quick brown fox jumps over the lazy dog.';
 
while ( $string =~ /(\w+)\s+(?=(\w+))/g ) { 
   say "$1 $2";
}

对根据需要重叠打印。这允许单词之间的任何数量的空间。


一个解释。

在用(\ W+)捕获一个单词后,LookAhead (?= ...)仅断言(“向前看”) “消耗”它也无法超越它(当我们用(额外的)帕伦斯捕获它时,我们在$ 1$ 2中获得了两个单词。我们只消耗了一个单词,正则发动机在第一个单词之后就停留在空间之后。

因此,在下一个迭代中,它可以匹配下一个单词,这是Lookahead的最后一个“看到”。然后它再次发现了lookahead的下一个单词,再次捕获了这两种单词。等等,因此重叠。


删除+,仅使用\ s,如果您确实只想允许一个 whitespace 。如果仅想要一个字面空间 - 没有标签等,请参阅链接以获取\ s匹配的内容 - 然后,而不是\ s使用 (字面空间, space )或[],“字符类”(括号)内的字面空间(括号),为了清楚起见。

Can capture both but not consume the second using a lookahead, so that pairs overlap

use warnings;
use strict;
use feature 'say';

my $string = shift // 'The quick brown fox jumps over the lazy dog.';
 
while ( $string =~ /(\w+)\s+(?=(\w+))/g ) { 
   say "$1 $2";
}

Prints as desired. This allows any amount of whitespace between words.


An explanation.

After a word is captured with (\w+), the lookahead (?=...) merely asserts ("looks ahead") that another word follows but doesn't "consume" it nor advances past it (while we capture it with (extra) parens, so we get two words in $1 and $2). We consumed just one word and the regex engine stays right after the space(s) following the first word.

So in the next iteration it can match the next word, the one last "seen" by lookahead. Then it again spots yet the next word by the lookahead, again capturing both. Etc. Thus the overlap.


Drop that + and use only \s if you indeed want to allow only one whitespace. If you want a literal space only -- no tabs etc, see the link for what \s matches -- then instead of \s use (literal space, SPACE ) or [ ], literal space inside a "character class" (brackets), for clarity.

北恋 2025-02-13 13:35:45

您可以使用

(\w+)\s(?=(\w+\b))

REGEX说明

  • 捕获组
    • \ w+匹配一个字
  • ​遵循正则匹配
    • 捕获组
      • \ w+\ b匹配一个字
    • 关闭组
  • close lookahead

请参阅regex demo

perl示例

my $line = "The quick brown fox jumps over the lazy dog.";

while ($line =~ /(\w+)\s(?=(\w+\b))/g) {
    print("$1 $2\n");
}

输出

The quick
quick brown
brown fox
fox jumps
jumps over
over the
the lazy
lazy dog

You can use

(\w+)\s(?=(\w+\b))

Regex Explanation

  • ( Capturing group
    • \w+ Match a word
  • ) Close group
  • \s Match a space
  • (?= Lookahead assertion - assert that the following regex matches
    • ( Capturing group
      • \w+\b Match a word
    • ) Close group
  • ) Close lookahead

See regex demo

Perl Example

my $line = "The quick brown fox jumps over the lazy dog.";

while ($line =~ /(\w+)\s(?=(\w+\b))/g) {
    print("$1 $2\n");
}

Output

The quick
quick brown
brown fox
fox jumps
jumps over
over the
the lazy
lazy dog
野心澎湃 2025-02-13 13:35:45

如果将字符串分为一系列单词,则根本不需要用正则表达式做任何事情:

#!/usr/bin/env perl                                                                                                                                                                                                                              
use strict;
use warnings;
use feature qw/say/;

my $line = "The quick brown fox jumps over the lazy dog.";
$line =~ s/[^\w\s]//g; # Remove non-word, non-whitespace characters                                                                                                                                                                              
my @words = split ' ', $line;
for my $i (0 .. $#words - 1) {
    say "$words[$i] $words[$i + 1]";
}

You don't need to do anything fancy with regular expressions at all if you split the string up into an array of words:

#!/usr/bin/env perl                                                                                                                                                                                                                              
use strict;
use warnings;
use feature qw/say/;

my $line = "The quick brown fox jumps over the lazy dog.";
$line =~ s/[^\w\s]//g; # Remove non-word, non-whitespace characters                                                                                                                                                                              
my @words = split ' ', $line;
for my $i (0 .. $#words - 1) {
    say "$words[$i] $words[$i + 1]";
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文