如何在文本中定义人名(Java)
我有一些输入文本,其中包含一个或多个人名。我没有这些名字的字典。哪个 Java 库可以帮助我根据输入文本定义名称? 我浏览了 OpenNLP,但没有找到任何示例或指南,或者至少没有找到如何将其应用到我的代码中的描述。 (我看到了 javadoc,但对于这样的项目来说,这是相当糟糕的文档。)
我想从一些随机文本中查找名称。如果输入文本是“我的朋友张三去商店。”,那么我想要得到“张三”。我认为智能引擎上应该有一些足够大的词典,基于较小的词典,可以理解人名。
I have some input text, which contains one or more human person names. I do not have any dictionary for these names. Which Java library can help me to define names from my input text?
I looked through OpenNLP, but did not find any example or guide or at least description of how it can be applied into my code. (I saw javadoc, but it is pretty poor documentation for such a project.)
I want to find names from some random text. If the input text is "My friend Joe Smith went to the store.", then I want to get "Joe Smith". I think there should be some large enough dictionaries on smart engines, based on smaller dictionaries, that can understand human names.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
我会调查一下 LingPipe。查看此演示。顺便说一句,您想要做的事情称为“命名实体识别”。这是一道很难做对的计算机科学问题。
I'd look into LingPipe. Check out this demo. By the way, what you are trying to do is called "named entity recognition". It's a difficult CS problem to get right.
OpenNLP 具有命名实体识别功能。检查文档中的英文名称查找部分。但我的经验表明,它识别实体,但没有与之关联的标签。 (准确地说,我发现标签分配不明确。)因此,如果您有句子“我的朋友 Joe Smith 去了 Walmart 商店”,OpenNLP 会识别两个命名实体 - “Joe Smith”和“Walmart”。我无法将其标记为“Joe Smith”作为人员,将“Walmart”标记为组织。
正如 Matt 所建议的,您可以尝试 LingPipe,尽管它是一个商业工具。一些开源替代品是 MorphAdorner 和 斯坦福 NER。
OpenNLP has Named Entity recognition. Check the section English Name Finding in the docs. But my experience suggests, it identifies entities but there are no tags associated with it. (To be precise, I found the tags to ambiguously assigned.) So, if you have the sentence "My friend Joe Smith went to the Walmart store", OpenNLP identifies two named entities - "Joe Smith" and "Walmart". I couldn't get it tag "Joe Smith" as Person and "Walmart" as Organization.
As suggested by Matt, you can try LingPipe, though it's a commercial tool. Some of the open source alternatives are MorphAdorner and Stanford NER.
当我们等待有关您正在做什么的详细信息时,这里有一些常见名字列表的链接,至少在美国人口中是这样的:
我认为你需要这些(和/或更多)来检查,因为你的任务听起来不像 NLP 可以做的事情为您提供没有参考信息的信息。
While we're waiting for details on what you're doing, here are a couple of links to lists of common first names, at least in the USA demographic:
I think you'll need these (and/or more) to check against, as your task doesn't sound like something a NLP can do for you without reference information.
您可以在此处查看自由文本中的人物提取 http://code.google.com/ p/graph-expression/wiki/示例
You can check Person extraction from free text here http://code.google.com/p/graph-expression/wiki/Examples
OpenNlp 在他们的 NER 模型中有一个 person 类型。从 opennlp 网站下载项目和模型,并从模型网站获取模型(Opennlp 页面上有一个链接)。然后去这里,http://www.asksunny.com/drupal/?q=node /4 这是如何加载模型和执行 NER 的一个很好的示例。 NER 从来都不是完美的,所以不要失望。
OpenNlp has a person type in their NER model. download the project and models from the opennlp web site, and get the models from the models website (there is a link on the Opennlp page). Then go here, http://www.asksunny.com/drupal/?q=node/4 it is a good example of how to load the models and perform NER. NER is never perfect, so don't be dissapointed.
我建议您使用斯坦福名称实体识别器(NER)。斯坦福 NER 提供了许多分类器。 stanford NER 提供的分类器之一可以从给定的文本中识别名称、位置和组织。
您可以在此链接中找到 stanford NER 的在线演示
http://nlp.stanford.edu:8080/ner/
I would suggest you using stanford Name Entity Recognizer(NER). Stanford NER provides many classifiers.One of the classifiers provided by stanford NER can identify name,location and organization from a given text.
You can find an online demo for stanford NER in this link
http://nlp.stanford.edu:8080/ner/
您还可以查看 OpenCyc 和 WordNet 项目,从语义角度来看,它们更有趣。
You can also look through OpenCyc and WordNet projects as more interesting from semantic view point.
这个问题在自然语言处理中的命名实体识别中得到解决,目前被认为是一个有点困难的问题。但是,您可以使用许多工具来实现此目的。我为此使用了 stanford NER,它是一个很好的软件。
This problem is addressed in named entity recognition in natural language processing and at the moment it is considered to be a bit hard problem. However there are many tools you can use for that. I have used stanford NER for this and it is a good software.
OpenCalais 服务可能会有用。尝试使用他们的提交工具: http://www.opencalais.com/documentation/calais-submission -tool
该工具不仅仅可以识别人名。
OpenCalais service may be useful. Try their submission tool at: http://www.opencalais.com/documentation/calais-submission-tool
This tool recognizes much more than just person names.
尝试Stanford NER,一个文本处理库
http://nlp.stanford.edu:8080/ner /
Try Stanford NER, a text processing library
http://nlp.stanford.edu:8080/ner/