在新训练的Spacy NER模型中没有POS标签,如何启用?

发布于 2025-02-13 20:33:05 字数 1150 浏览 3 评论 0 原文

我在 spacy Training QuickStart )之后,我训练了一个NER模型。代码>培训管道,因为它是我唯一的数据。

现在,这是部分 config

[nlp]
lang = "en"
pipeline = ["tok2vec","ner","tagger"]
batch_size = 1000
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
...
[components.tagger]
source = "en_core_web_sm"
component = "tagger"
replace_listeners = ["model.tok2vec"]
...
[training]
...
frozen_components = ["tagger"]

现在,当我获得实体预测时,没有 pos 标签。

例如, ent in doc.ents 在令牌上没有 pos _

>>> ent
Some Entity
>>> ent.label_
'LABEL_NAME'
>>> [token.pos_ for token in ent]
['', '']

那么,如何仅训练 ner 管道,仍然允许使用 tagger 预测 pos 标签?

是否有一种方法可以从另一个模型中加载 pos 标签预测,例如使用 en_core_web_sm tagger ,并为<代码> ner ?

我正在尝试使用 frozen_components ,但似乎不起作用。

I trained a NER model following the spaCy Training Quickstart and only enabled the ner pipeline for training since it is the only data I have.

Here is the partial config

[nlp]
lang = "en"
pipeline = ["tok2vec","ner","tagger"]
batch_size = 1000
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
...
[components.tagger]
source = "en_core_web_sm"
component = "tagger"
replace_listeners = ["model.tok2vec"]
...
[training]
...
frozen_components = ["tagger"]

Now when I get entity predictions, there are no POS tags.

For example, a ent in doc.ents will have no pos_ on the tokens.

>>> ent
Some Entity
>>> ent.label_
'LABEL_NAME'
>>> [token.pos_ for token in ent]
['', '']

So how do I only train the ner pipeline and still allow POS tags to be predicted with the tagger?

Is there a way to load the POS tag predictions from another model such as using the en_core_web_sm for the tagger and using my trained model for the ner?

I am trying to use the frozen_components but it does not seem to work.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

超可爱的懒熊 2025-02-20 20:33:05

是的,您可以从其他管道中“源”组件。请参阅“ nofollow noreferrer”> sourcing compontents docs docs 有关 double nore project 提供了使用两个NER组件进行的示例。

基本上,您可以做到这一点:

import spacy

nlp = spacy.load("my_ner")
nlp_tagger = spacy.load("en_core_web_sm") # load the base pipeline
# give this component a copy of its own tok2vec
nlp_tagger.replace_listeners("tok2vec", "tagger", ["model.tok2vec"])

nlp.add_pipe(
    "tagger",
    name="tagger",
    source=nlp_tagger,
    after="ner",
)

请注意,两个管道都需要具有相同的单词向量,否则这是不起作用的,如“采购组件文档”中所述。在这种情况下, sm 模型没有单词向量,因此,如果您的管道也没有单词向量,则可以使用。

Yes, you can "source" a component from a different pipeline. See the sourcing components docs for general information about that, or the double NER project for an example of doing it with two NER components.

Basically you can do this:

import spacy

nlp = spacy.load("my_ner")
nlp_tagger = spacy.load("en_core_web_sm") # load the base pipeline
# give this component a copy of its own tok2vec
nlp_tagger.replace_listeners("tok2vec", "tagger", ["model.tok2vec"])

nlp.add_pipe(
    "tagger",
    name="tagger",
    source=nlp_tagger,
    after="ner",
)

Note that both pipelines need to have the same word vectors or this won't work, as described in the sourcing components docs. In this case the sm model has no word vectors, so it will work if your pipeline also has no word vectors, for example.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文