使用Spacy从文本中提取信息
我想建立一个模型,以提取网站收集
的个人数据。第一步,我取消下图:
从这些句子中,我只想提取包含单词<的< em> “个人信息”
我使用了以下代码,但我没有得到我想要的结果:
def find_names(text):
names = []
# spacy doc
doc = nlp(text)
# pattern
pattern = [{'LOWER':'personal'},
{'LOWER':'data'}]
# Matcher class object
matcher = Matcher(nlp.vocab)
matcher.add("names", [pattern])
matches = matcher(doc)
# finding patterns in the text
for i in range(0,len(matches)):
# match: id, start, end
token = doc[matches[i][1]:matches[i][2]]
# append token to list
names.append(str(token))
return names
# apply function
df2['PM_Names'] = df2['Sent'].apply(find_names)
输出:
我想用单词 “个人信息”提取整个句子em>仅。
I want to build a model that extracts personal data collected by a website.
The first step, I scrapped the privacy policy of a website, then I split it into sentences and put them on a dataframe as shown in the image below:
From those sentences, I only want to extract those that contains the words "personal information"
I used the code below but I don't get the result I want:
def find_names(text):
names = []
# spacy doc
doc = nlp(text)
# pattern
pattern = [{'LOWER':'personal'},
{'LOWER':'data'}]
# Matcher class object
matcher = Matcher(nlp.vocab)
matcher.add("names", [pattern])
matches = matcher(doc)
# finding patterns in the text
for i in range(0,len(matches)):
# match: id, start, end
token = doc[matches[i][1]:matches[i][2]]
# append token to list
names.append(str(token))
return names
# apply function
df2['PM_Names'] = df2['Sent'].apply(find_names)
The output:
I want to extract the whole sentence with the words "personal information" only.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论