Perl 和 NLP，从传记中解析名字

发布于 2024-09-09 17:34:09 字数 639 浏览 5 评论 0原文

总的来说，我对 NLP 还很陌生，但对 Perl 非常熟悉，我想知道有哪些强大的 NLP 模块。基本上，我有一个包含一堆段落的文件，其中一些是人们的传记。因此，首先我需要查找一个人的名字，这有助于后面的其余过程。

所以我大致是这样开始的：

foreach $PPid (0 .. $PPscalar) {
$paragraph = @PP[$PPid];
if ($paragraph =~ /^(\w+ \w\. \w+|\w+ \w+)( also|)( has served| served| worked| joined| currently serves| has| was| is|, )/){
    $possibleName = $1;
    $badName = 0;
    foreach $piece (@pieces){
    if ($possibleName =~ /$piece/){
        $badName = 1;
    }
    }
    if ($badName == 0){
    push @namePile, $possibleName;
    }
}

}

因为大多数名字都是从段落的开头开始的。然后我正在寻找表示行动或占有的关键字，但现在，它会拾取额外的不是名称的垃圾。必须有一个模块来做到这一点，对吗？

原文

I'm pretty new to NLP in general, but getting really good at Perl, and I was wondering what kind of powerful NLP modules are out there. Basically, I have a file with a bunch of paragraphs, and some of them are people's biographies. So, first I need to look for a person's name, and that helps with the rest of the process later.

So I was roughly starting with something like this:

foreach $PPid (0 .. $PPscalar) {
$paragraph = @PP[$PPid];
if ($paragraph =~ /^(\w+ \w\. \w+|\w+ \w+)( also|)( has served| served| worked| joined| currently serves| has| was| is|, )/){
    $possibleName = $1;
    $badName = 0;
    foreach $piece (@pieces){
    if ($possibleName =~ /$piece/){
        $badName = 1;
    }
    }
    if ($badName == 0){
    push @namePile, $possibleName;
    }
}

}

Because most of the names start at the beginning of the paragraphs. And then I'm looking for keywords that denote action or possession, but right now, that picks up extra junk that is not a name. There has to be a module to do this, right?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清引 2024-09-16 17:34:09

从数据中提取名称很困难。有多种解决方案。对于命名实体提取，您可以使用以下

简单方法。我记得看过这个并且对输出不感兴趣。
字典方法。我已经使用过这个，但是有很多漏报，而且我不太喜欢它下面的代码。
带有 Perl 接口的开源二进制文件（不推荐），我是这个 cpan 库的作者 - 并且设置它也很繁琐）。
最好的解决方案是使用 Net::Calais perl 包装器 Net 的专有 Web 服务

：：加莱是迄今为止速度和准确性的最佳选择。如果您需要开源的底层实现，请使用斯坦福图书馆。

回复收藏 0 原文

仅一夜美梦 2024-09-16 17:34:09

您尝试过搜索 CPAN 吗？

http://search.cpan.org/search?query=NLP&mode=所有

我还尝试搜索“自然语言”，并发现您可能感兴趣的以下内容：

Lingua::EN::Tagger

另外，如果您必须自己开发 NLP，您需要查看 Regexp::Grammars。这是 Parse::RecDesent 的继承者。

回复收藏 0 原文

可可 2024-09-16 17:34:09

我不知道有任何 Perl 模块可以处理英语以将其分解为词性。我希望有 C 或 C++ 或其他语言的库可以做到这一点，所以如果您找不到好的答案，也许您可以扩大您的搜索范围。

一种简单的方法是检查两个大写的单词：

if (/[A-Z][a-z]+\s+[A-Z][a-z]/) { ...

或检查标题：

if (/(?:Mr|Mrs|Ms|Dr)\.?\s+[A-Z][a-z]+/) { ...

I don't know of any Perl modules which do processing of English in order to break it into parts of speech. I expect there are libraries out there which do that, in C or C++ or something, so if you don't find a good answer, maybe you can broaden your search.

One easy hack is to check for two words which are both capitalized:

if (/[A-Z][a-z]+\s+[A-Z][a-z]/) { ...

or check for titles:

if (/(?:Mr|Mrs|Ms|Dr)\.?\s+[A-Z][a-z]+/) { ...

回复收藏 0 原文

~没有更多了~

关于作者

迎风吟唱

暂无简介

文章

24 人气

关注发私信

友情链接

文江博客

Perl 和 NLP，从传记中解析名字

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

毁梦

qq_02ocQH

花期渐远

鞋纸虽美，但不合脚ㄋ〞

adminaaa

yangzhenyu

友情链接

Perl 和 NLP，从传记中解析名字

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

毁梦

qq_02ocQH

花期渐远

鞋纸虽美，但不合脚ㄋ〞

adminaaa

yangzhenyu

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。