如何使用THINC型号创建自定义的空地管道组件
我想在Spacy中创建一个自定义管道组件,该组件使用预先训练的THINC模型。我想修改THINC的输出预测,然后将修改后值传递回管道IE,有效地修改NER管道组件。
我正在考虑通过自定义管道组件进行此操作,类似:
from spacy.language import Language
@Language.component("my_ner")
def my_ner(doc):
class_probabilities = thinc_do_something(data, model, num_samples)
class_value = np.argmax(class_probabilities, axis=1)
return doc
nlp = spacy.load("en_core_web_sm", exclude=["ner"])
nlp.add_pipe("my_ner", after="parser") # Insert after the parser
print(nlp.pipe_names) # ['tagger', 'parser', 'my_ner']
doc = nlp("This is a sentence.")
我的目的是使管道按原始组件运行,但使用我的自定义NER组件修改类概率。不幸的是,我从Spacy文档中不了解:
- 如何从管道内部访问训练有素的模型?
- 如何访问管道中用于模型预测的数据?
- 我需要在哪里编写预测值,作为我修改的NER管道的一部分?
- 有更好的方法吗?
I'd like to create a custom pipeline component in spaCy which uses a pre-trained Thinc model. I'd like to modify the output prediction from Thinc and then pass the modified value back into the pipeline i.e. effectively modifying the ner pipeline component.
I was thinking of doing this via a custom pipeline component, something like:
from spacy.language import Language
@Language.component("my_ner")
def my_ner(doc):
class_probabilities = thinc_do_something(data, model, num_samples)
class_value = np.argmax(class_probabilities, axis=1)
return doc
nlp = spacy.load("en_core_web_sm", exclude=["ner"])
nlp.add_pipe("my_ner", after="parser") # Insert after the parser
print(nlp.pipe_names) # ['tagger', 'parser', 'my_ner']
doc = nlp("This is a sentence.")
My aim is for the pipe to run as per the original ner component, but with my custom ner component modifying the class probabilities. Unfortunately I don't understand from the spaCy documentation:
- How to access the pre trained model from inside the pipeline?
- How to access the data used for the model prediction within the pipeline?
- Where I need to write the model predicted value back to as part of my modified ner pipline?
- Is there a better way of doing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我以前没有听说过有人做过这样的事情,虽然有可能,但这并不像您建议的那样简单。您拥有的示例组件是简单的无状态组件,这些组件只是一个函数。为了修改可训练的管道的工作原理,您必须通过将现有的管子分类或以其他方式来制作自己的管道。
您应该查看现有管道以供参考, textcat >可能是较简单的之一。对于可训练的管道,当在管道中使用时,它们基本上使用
预测
set_annotations ,如图所示在TrainablePipe实施中。与其进行子类别,不如复制要使用的组件,修改一些位并给它一个新名称。
I have not heard of anyone doing something like that before, and while it is possible, it is not as simple as you suggest. The example component you have is for simple stateless components that are just a function. In order to modify how a trainable pipe works you'd have to make your own pipe, by subclassing an existing one or otherwise.
You should look at existing pipes for reference, the textcat is probably one of the simpler ones. For trainable pipes, when used in a pipeline they basically use
predict
andset_annotations
, as shown in the TrainablePipe implementation.Rather than subclassing, it might also be easier to just copy the component you want to use, modify a few bits, and give it a new name.