如何使用THINC型号创建自定义的空地管道组件

发布于 2025-02-09 09:56:20 字数 930 浏览 3 评论 0原文

我想在Spacy中创建一个自定义管道组件，该组件使用预先训练的THINC模型。我想修改THINC的输出预测，然后将修改后值传递回管道IE，有效地修改NER管道组件。

我正在考虑通过自定义管道组件进行此操作，类似：

from spacy.language import Language

@Language.component("my_ner")
def my_ner(doc):

    class_probabilities = thinc_do_something(data, model, num_samples)
    class_value = np.argmax(class_probabilities, axis=1)
    
    return doc

nlp = spacy.load("en_core_web_sm", exclude=["ner"])
nlp.add_pipe("my_ner", after="parser")  # Insert after the parser
print(nlp.pipe_names)  # ['tagger', 'parser', 'my_ner']
doc = nlp("This is a sentence.")

我的目的是使管道按原始组件运行，但使用我的自定义NER组件修改类概率。不幸的是，我从Spacy文档中不了解：

如何从管道内部访问训练有素的模型？
如何访问管道中用于模型预测的数据？
我需要在哪里编写预测值，作为我修改的NER管道的一部分？
有更好的方法吗？

原文

I'd like to create a custom pipeline component in spaCy which uses a pre-trained Thinc model. I'd like to modify the output prediction from Thinc and then pass the modified value back into the pipeline i.e. effectively modifying the ner pipeline component.

I was thinking of doing this via a custom pipeline component, something like:

from spacy.language import Language

@Language.component("my_ner")
def my_ner(doc):

    class_probabilities = thinc_do_something(data, model, num_samples)
    class_value = np.argmax(class_probabilities, axis=1)
    
    return doc

nlp = spacy.load("en_core_web_sm", exclude=["ner"])
nlp.add_pipe("my_ner", after="parser")  # Insert after the parser
print(nlp.pipe_names)  # ['tagger', 'parser', 'my_ner']
doc = nlp("This is a sentence.")

My aim is for the pipe to run as per the original ner component, but with my custom ner component modifying the class probabilities. Unfortunately I don't understand from the spaCy documentation:

How to access the pre trained model from inside the pipeline?
How to access the data used for the model prediction within the pipeline?
Where I need to write the model predicted value back to as part of my modified ner pipline?
Is there a better way of doing this?

分享到QQ

分享到微博