在 spacy 中保存和加载 nlp 结果
我想使用 SpaCy 分析许多小文本,并且想存储 nlp 结果以供进一步使用以节省处理时间。我在存储和加载包含词向量的spaCy文档中找到了代码但我收到错误,但找不到解决方法。我对 python 相当陌生。
在下面的代码中,我将 nlp 结果存储到文件中并尝试再次读取它。我可以编写第一个文件,但找不到第二个文件(词汇)。我还收到两个错误:Doc
和 Vocab
未定义。
任何修复此问题或其他方法以达到相同结果的想法都非常受欢迎。
谢谢!
import spacy
nlp = spacy.load('en_core_web_md')
doc = nlp("He eats a green apple")
for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, token.is_alpha, token.is_stop)
NLP_FName = "E:\\SaveTest.nlp"
doc.to_disk(NLP_FName)
Vocab_FName = "E:\\SaveTest.voc"
doc.vocab.to_disk(Vocab_FName)
#To read the data again:
idoc = Doc(Vocab()).from_disk(NLP_FName)
idoc.vocab.from_disk(Vocab_FName)
for token in idoc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, token.is_alpha, token.is_stop)
I want to use SpaCy to analyze many small texts and I want to store the nlp results for further use to save processing time. I found code at Storing and Loading spaCy Documents Containing Word Vectors but I get an error and I cannot find how to fix it. I am fairly new to python.
In the following code, I store the nlp results to a file and try to read it again. I can write the first file but I do not find the second file (vocab). I also get two errors: that Doc
and Vocab
are not defined.
Any idea to fix this or another method to achieve the same result is more than welcomed.
Thanks!
import spacy
nlp = spacy.load('en_core_web_md')
doc = nlp("He eats a green apple")
for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, token.is_alpha, token.is_stop)
NLP_FName = "E:\\SaveTest.nlp"
doc.to_disk(NLP_FName)
Vocab_FName = "E:\\SaveTest.voc"
doc.vocab.to_disk(Vocab_FName)
#To read the data again:
idoc = Doc(Vocab()).from_disk(NLP_FName)
idoc.vocab.from_disk(Vocab_FName)
for token in idoc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, token.is_alpha, token.is_stop)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我尝试了你的代码,遇到了一些小问题,我在下面的代码中修复了这些问题。
请注意,
SaveTest.nlp
是一个二进制文件,其中包含您的文档信息和SaveTest.voc
是一个包含所有 spacy 模型词汇信息(向量、字符串等)的文件夹。我所做的更改:
spacy.tokens
导入Doc
类spacy.vocab
导入Vocab
类en_core_web_md
模型使用以下命令:请注意,spacy 对于每种语言都有多个模型,通常您必须先下载它(通常是
sm
、md
和lg
型号)。 此处了解更多相关信息。代码:
请告诉我这是否对您有帮助,如果没有,请将您的错误消息添加到您原来的问题中,以便我可以提供帮助。
I tried your code and I had a few minor issues which I fixed on the code below.
Note that
SaveTest.nlp
is a binary file with your doc info andSaveTest.voc
is a folder with all the spacy model vocab information (vectors, strings among other).Changes I made:
Doc
class fromspacy.tokens
Vocab
class fromspacy.vocab
en_core_web_md
model using the following command:Please note that spacy has multiple models for each language, and usually you have to download it first (typically
sm
,md
andlg
models). Read more about it here.Code:
Let me know if this is helpful to you, and if not, please add your error message to your original question so I can help.
执行此操作的有效方法是使用
DocBin
代替:https: //spacy.io/usage/ saving-loading#docs改编自文档的示例(您可以使用
doc_bin.to/from_disk
而不是to/from_bytes
):The efficient way to do this is to use a
DocBin
instead: https://spacy.io/usage/saving-loading#docsExample adapted from the docs (you can use
doc_bin.to/from_disk
instead ofto/from_bytes
):很难得到答案,但我尝试了你的代码,它对 DocBins 不起作用。我将我的代码粘贴到下面的导入部分
错误我得到:
Long shot of getting an answer but I tried your code and it doesn't work for DocBins. I pasted my code below for the import part
Error I get: