如何使用Spacy从磁盘加载定制的NER模型?
我已经定制了NER管道,其中包括以下过程,
doc = nlp("I am going to Vallila. I am going to Sörnäinen.")
for ent in doc.ents:
print(ent.text, ent.label_)
LABEL = 'DISTRICT'
TRAIN_DATA = [
(
'We need to deliver it to Vallila', {
'entities': [(25, 32, 'DISTRICT')]
}),
(
'We need to deliver it to somewhere', {
'entities': []
}),
]
ner = nlp.get_pipe("ner")
ner.add_label(LABEL)
nlp.disable_pipes("tagger")
nlp.disable_pipes("parser")
nlp.disable_pipes("attribute_ruler")
nlp.disable_pipes("lemmatizer")
nlp.disable_pipes("tok2vec")
optimizer = nlp.get_pipe("ner").create_optimizer()
import random
from spacy.training import Example
for i in range(25):
random.shuffle(TRAIN_DATA)
for text, annotation in TRAIN_DATA:
example = Example.from_dict(nlp.make_doc(text), annotation)
nlp.update([example], sgd=optimizer)
我试图将自定义的NER保存到磁盘上,并通过以下代码再次加载它,
ner.to_disk('/home/feru/ner')
import spacy
from spacy.pipeline import EntityRecognizer
nlp = spacy.load("en_core_web_lg", disable=['ner'])
ner = EntityRecognizer(nlp.vocab)
ner.from_disk('/home/feru/ner')
nlp.add_pipe(ner)
但是我遇到了以下错误:
----> 10 ner = EntityRecognizer(nlp.vocab) 11 ner.from_disk('/home/feru/ner') 12 nlp.add_pipe(ner)
〜/.local/lib/python3.8/site-ackages/spacy/pipeline/ner.pyx in spacy.pipeline.ner.entityRocognizer。 init ()
typeerror: init ()至少有2个位置参数(1给定)
此方法以从磁盘中保存和加载自定义组件似乎来自一些杂色的版本。第二个论点实体识别需要什么?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您遵循的一般过程将单个组件序列化并重新加载,这不是在Spacy中执行此操作的推荐方法。您可以做到 - 当然必须在内部进行 - 但是通常需要使用高级包装器保存和加载管道。在这种情况下,这意味着您可以这样保存:
然后加载
spacy.load(“ my_model”)
。您可以在课程也。它涵盖了V3中的新基于配置的培训,这比使用您自己的自定义培训循环(如代码样本中)要容易得多。
如果您想从不同管道中混合组件,您通常仍然需要保存整个管道,然后可以使用“源”功能。
The general process you are following of serializing a single component and reloading it is not the recommended way to do this in spaCy. You can do it - it has to be done internally, of course - but you generally want to save and load pipelines using high-level wrappers. In this case this means that you would save like this:
And then load it with
spacy.load("my_model")
.You can find more detail about this in the saving and loading docs. Since it seems you're just getting started with spaCy, you might want to go through the course too. It covers the new config-based training in v3, which is much easier than using your own custom training loop like in your code sample.
If you want to mix and match components from different pipelines, you still will generally want to save entire pipelines, and you can then combine components from them using the "sourcing" feature.