如何使用带有模式的 spacy 从简历中提取准确的人名
我正在使用 spacy 模型 en_core_web_sm
从简历中提取人名,并使用类似的 spacy 模式,
PATTERN = [
[{'POS': 'PROPN'}, {'POS': 'PROPN'}, {'POS': 'PROPN'}],
[{'POS': 'PROPN'}, {'POS': 'PROPN'}],
[{'POS': 'PROPN'}, {'POS': 'NOUN'}, {'POS': 'PROPN'}],
[{'POS': 'NOUN'}, {'POS': 'PROPN'}, {'POS': 'PROPN'}]
]
但是使用此方法可以正常工作,并给出人的确切名字,但有时会给出错误的名字,例如简历,来自 Resume Genius,Sr,电气工程师
matcher.add("NAME", PATTERN)
matcher = Matcher(nlp.vocab)
matches = matcher(nlp_text)
for match_id, start, end in matches:
span = nlp_text[start:end]
return span.text
我得到这样的Name
,但由于识别正确的名称而出现问题。请给我一个解决方案。谢谢
I'm extracting the human name from the resume with the spacy model en_core_web_sm
and using spacy patterns like that
PATTERN = [
[{'POS': 'PROPN'}, {'POS': 'PROPN'}, {'POS': 'PROPN'}],
[{'POS': 'PROPN'}, {'POS': 'PROPN'}],
[{'POS': 'PROPN'}, {'POS': 'NOUN'}, {'POS': 'PROPN'}],
[{'POS': 'NOUN'}, {'POS': 'PROPN'}, {'POS': 'PROPN'}]
]
But With This working fine some and giving the exact name of the human but some time giving the wrong name like curriculum vitae, from Resume Genius, Sr, Electrical Engineer
matcher.add("NAME", PATTERN)
matcher = Matcher(nlp.vocab)
matches = matcher(nlp_text)
for match_id, start, end in matches:
span = nlp_text[start:end]
return span.text
I'm getting Name
like this but giving problems due to identifying the proper name. Please give me a solution. Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试将人名与这样的模式相匹配没有任何意义,这不是解决问题的正确方法。
如果您想获取人名,这就是 NER(命名实体识别)组件的用途。您应该使用 doc.ents 并获取带有标签 PERSON 的所有实体,而不是在词性类型上使用模式。有关使用示例,请参阅文档。
Trying to match human names with a pattern like this doesn't make any sense, this is not the right way to approach the problem.
If you want to get human names that's what the NER (Named Entity Recognition) component is for. You should use
doc.ents
and get all the entities with the label PERSON rather than using a pattern on part of speech types. See the docs for a usage example.