在 Python 3.9 中使用 Spacy 从数据框中删除名称
我正在 Python 3.9 中使用 spacy 包 v3.2.1,并想了解如何使用它从数据框中删除名称。我尝试遵循 spacy 文档,并且能够正确识别名称,但不明白如何删除它们。我的目标是删除数据框特定列中的所有名称。
实际
ID | Comment |
---|---|
A123 | 我五岁了,我的名字叫约翰 |
X907 | 今天我见到了雅各布博士 |
我想要完成的事情
ID | Comment |
---|---|
A123 | 我五岁了老了,我叫 |
X907 | 今天我遇到了代码博士 |
:
#loading packages
import spacy
import pandas as pd
from spacy import displacy
#loading CSV
df = pd.read_csv('names.csv)
#loading spacy large model
nlp = spacy.load("en_core_web_lg")
#checking/testing is spacy large is identifying named entities
df['test_col'] = df['Comment'].apply(lambda x: list(nlp(x).ents))
我的代码做了什么
ID | Comment | test_col |
---|---|---|
A123 | 我五岁了,我叫约翰 | [(John)] |
X907 | 今天我会见了雅各布博士 | [(Jacob)] |
但我该如何从评论栏中删除这些名字呢?我认为我有某种函数可以迭代数据帧的每一行并删除已识别的实体。非常感谢您的帮助
,谢谢
I am working with spacy package v3.2.1 in Python 3.9 and wanted to understand how I can use it to remove names from a data frame. I tried following the spacy documentation and I am able to identity names correctly, but not understanding how I can remove them. My goal is to remove all names from a specific column of the data frame.
Actual
ID | Comment |
---|---|
A123 | I am five years old, and my name is John |
X907 | Today I met with Dr. Jacob |
What I am trying to accomplish
ID | Comment |
---|---|
A123 | I am five years old, and my name is |
X907 | Today I met with Dr. |
Code:
#loading packages
import spacy
import pandas as pd
from spacy import displacy
#loading CSV
df = pd.read_csv('names.csv)
#loading spacy large model
nlp = spacy.load("en_core_web_lg")
#checking/testing is spacy large is identifying named entities
df['test_col'] = df['Comment'].apply(lambda x: list(nlp(x).ents))
What my code does
ID | Comment | test_col |
---|---|---|
A123 | I am five years old, and my name is John | [(John)] |
X907 | Today I met with Dr. Jacob | [(Jacob)] |
But how do I go from removing those names from the Comment column? I think I some sort of function that iterates over each row of the data frame and removes the identified entities. Would appreciate your help
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用
输出:
You can use
Output:
这是使用字符串
replace
方法的想法:编辑:去掉括号看看是否有帮助。
我对变量进行了类型转换以帮助匹配,也不确定它是否是 str 。您可能需要使用索引,如果在单个注释中找到多个名称,则需要循环它,但这就是它的要点。
Here's an idea using the string
replace
method:EDIT: Stripping parens off to see if that helps.
I typecasted the variables to help with the match, also not sure if it is a str or not. You may need to use an index, and loop it if there are multiple names found in a single comment, but that's the gist of it.