DocBin to_bytes/to_disk 被杀死
我正在处理相当大的语料库,当我尝试保存它时,我的 DocBin 对象被杀死。 to_disk 和 to_bytes 都在打印“Killed”。
我的Python知识有限,所以我不清楚如何解决这个问题。你能帮忙吗?
这是我的代码(非常简单和基本):
nlp = spacy.blank("en")
for text, annotations in train_data:
doc = nlp(text)
ents = []
for start, end, label in eval(annotations)['entities']:
span = doc.char_span(start, end, label=label)
if (span is None):
continue
ents.append(span)
doc.ents = ents
db.add(doc)
db.to_disk("../Spacy/train.spacy")```
I am dealing with fairly big corpuses and my DocBin object gets killed when I try to save it. Both to_disk and to_bytes are printing "Killed".
I am with limited python knowledge, so it isn't obvious to me right away how I can work around the issue. Can you help?
Here is my code(very straight forward and basic):
nlp = spacy.blank("en")
for text, annotations in train_data:
doc = nlp(text)
ents = []
for start, end, label in eval(annotations)['entities']:
span = doc.char_span(start, end, label=label)
if (span is None):
continue
ents.append(span)
doc.ents = ents
db.add(doc)
db.to_disk("../Spacy/train.spacy")```
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的 RAM 可能已用完。相反,请将注释保存在多个 DocBin 文件中。如果您有多个
,您可以使用
文件。spacy train
为--paths.train
提供目录,而不是单个.spacy
文件。空间大的You are probably running out of RAM. Instead, save your annotation in multiple
DocBin
files. You can provide a directory to--paths.train
withspacy train
instead of a single.spacy
file if you have multiple.spacy
files.