如何使用Spacy从磁盘加载定制的NER模型？

发布于 2025-01-25 21:58:38 字数 1579 浏览 3 评论 0 原文

我已经定制了NER管道，其中包括以下过程，

doc = nlp("I am going to Vallila. I am going to Sörnäinen.")
for ent in doc.ents:
    print(ent.text, ent.label_)

LABEL = 'DISTRICT'
TRAIN_DATA = [
    (
    'We need to deliver it to Vallila', {
        'entities': [(25, 32, 'DISTRICT')]
    }),
    (
    'We need to deliver it to somewhere', {
        'entities': []
    }),
]

ner = nlp.get_pipe("ner")
ner.add_label(LABEL)

nlp.disable_pipes("tagger")
nlp.disable_pipes("parser")
nlp.disable_pipes("attribute_ruler")
nlp.disable_pipes("lemmatizer")
nlp.disable_pipes("tok2vec")

optimizer = nlp.get_pipe("ner").create_optimizer()
import random
from spacy.training import Example

for i in range(25):
    random.shuffle(TRAIN_DATA)
    for text, annotation in TRAIN_DATA:
        example = Example.from_dict(nlp.make_doc(text), annotation)
        nlp.update([example], sgd=optimizer)

我试图将自定义的NER保存到磁盘上，并通过以下代码再次加载它，

ner.to_disk('/home/feru/ner')

import spacy
from spacy.pipeline import EntityRecognizer
nlp = spacy.load("en_core_web_lg", disable=['ner'])

ner = EntityRecognizer(nlp.vocab)
ner.from_disk('/home/feru/ner')
nlp.add_pipe(ner)

但是我遇到了以下错误：

----＆gt; 10 ner = EntityRecognizer（nlp.vocab） 11 ner.from_disk（'/home/feru/ner'） 12 nlp.add_pipe（ner）

〜/.local/lib/python3.8/site-ackages/spacy/pipeline/ner.pyx in spacy.pipeline.ner.entityRocognizer。 init （）

typeerror： init （）至少有2个位置参数（1给定）

此方法以从磁盘中保存和加载自定义组件似乎来自一些杂色的版本。第二个论点实体识别需要什么？

原文

I have customized NER pipeline with following procedure

doc = nlp("I am going to Vallila. I am going to Sörnäinen.")
for ent in doc.ents:
    print(ent.text, ent.label_)

LABEL = 'DISTRICT'
TRAIN_DATA = [
    (
    'We need to deliver it to Vallila', {
        'entities': [(25, 32, 'DISTRICT')]
    }),
    (
    'We need to deliver it to somewhere', {
        'entities': []
    }),
]

ner = nlp.get_pipe("ner")
ner.add_label(LABEL)

nlp.disable_pipes("tagger")
nlp.disable_pipes("parser")
nlp.disable_pipes("attribute_ruler")
nlp.disable_pipes("lemmatizer")
nlp.disable_pipes("tok2vec")

optimizer = nlp.get_pipe("ner").create_optimizer()
import random
from spacy.training import Example

for i in range(25):
    random.shuffle(TRAIN_DATA)
    for text, annotation in TRAIN_DATA:
        example = Example.from_dict(nlp.make_doc(text), annotation)
        nlp.update([example], sgd=optimizer)

I tried to save that customized NER to disk and load it again with following code

ner.to_disk('/home/feru/ner')

import spacy
from spacy.pipeline import EntityRecognizer
nlp = spacy.load("en_core_web_lg", disable=['ner'])

ner = EntityRecognizer(nlp.vocab)
ner.from_disk('/home/feru/ner')
nlp.add_pipe(ner)

I got however following error:

---> 10 ner = EntityRecognizer(nlp.vocab)
11 ner.from_disk('/home/feru/ner')
12 nlp.add_pipe(ner)

~/.local/lib/python3.8/site-packages/spacy/pipeline/ner.pyx in
spacy.pipeline.ner.EntityRecognizer.init()

TypeError: init() takes at least 2 positional arguments (1 given)

This method to save and load custom component from disk seems to be from some erly SpaCy version. What's the second argument EntityRecognizer needs?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

￡冰雨忧蓝° 2025-02-01 21:58:38

您遵循的一般过程将单个组件序列化并重新加载，这不是在Spacy中执行此操作的推荐方法。您可以做到 - 当然必须在内部进行 - 但是通常需要使用高级包装器保存和加载管道。在这种情况下，这意味着您可以这样保存：

nlp.to_disk("my_model") # NOT ner.to_disk

然后加载 spacy.load（“ my_model”）。

您可以在课程也。它涵盖了V3中的新基于配置的培训，这比使用您自己的自定义培训循环（如代码样本中）要容易得多。

如果您想从不同管道中混合组件，您通常仍然需要保存整个管道，然后可以使用“源”功能。

The general process you are following of serializing a single component and reloading it is not the recommended way to do this in spaCy. You can do it - it has to be done internally, of course - but you generally want to save and load pipelines using high-level wrappers. In this case this means that you would save like this:

nlp.to_disk("my_model") # NOT ner.to_disk

And then load it with spacy.load("my_model").

You can find more detail about this in the saving and loading docs. Since it seems you're just getting started with spaCy, you might want to go through the course too. It covers the new config-based training in v3, which is much easier than using your own custom training loop like in your code sample.

If you want to mix and match components from different pipelines, you still will generally want to save entire pipelines, and you can then combine components from them using the "sourcing" feature.

回复收藏 0 原文

~没有更多了~