使用 Perl 提取单词

发布于 2024-12-18 10:21:25 字数 241 浏览 0 评论 0原文

我喜欢从文本中提取单词。我写了简单的正则表达式。

my $regex = qr[\W];
while(<DATA>){
    push  @words, split $regex;
}

我喜欢修改它以包含专有名称。专有名称可以组合多个“单词”。例如..

@names = ('John Smith', 'Joe Smith');

原文

I like to extract the words from the text. I have written the simple regex.

my $regex = qr[\W];
while(<DATA>){
    push  @words, split $regex;
}

I like to modify it to include proper names. Proper names may combine multiple 'words'. For example..

@names = ('John Smith', 'Joe Smith');

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不顾 2024-12-25 10:21:25

我认为没有明确的解决方案。正则表达式仅限于复杂的文本空间，例如网页或具有许多异常的书籍，例如书名呢？考虑使用 1) 自然语言处理或 2) 索引方法，您可以识别两个单词，以大写字母开头，用一个空格分隔，并查看其中一个单词是否包含在已知名字或姓氏的索引中。祝你好运。

回复收藏 0 原文

拥醉 2024-12-25 10:21:25

也许：

!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my @words;
while(<DATA>){
    push @words, $1 if m{([A-Z]\w*\s+[A-Z]\w*)};
}   
for my $name (@words) {
    print "$name\n";
}
print Dumper \@words;
__DATA__
John Smith I am
He is Joe Smith 
John Doe
Sam
Sally
Sally Girl

Perhaps:

!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my @words;
while(<DATA>){
    push @words, $1 if m{([A-Z]\w*\s+[A-Z]\w*)};
}   
for my $name (@words) {
    print "$name\n";
}
print Dumper \@words;
__DATA__
John Smith I am
He is Joe Smith 
John Doe
Sam
Sally
Sally Girl

回复收藏 0 原文

~没有更多了~