Spacy Custentity培训什么都没有回报
我有要提取颜色的描述。因此,我认为我会用spacy使用ner。 我有这样的数据,用于8000行,
import spacy
nlp=spacy.load('en_core_web_sm')
# Getting the pipeline component
ner=nlp.get_pipe("ner")
Train_data =
[
("Bermuda shorts anthracite/black",{"entities" : [(15,31,"COL")]}),
("Acrylic antique white",{"entities" : [(8,22,"COL")]}),
("Pincer black",{"entities" : [(8,13,"COL")]}),
("Cable tie black",{"entities" : [(10,15,"COL")]}),
("Water pump pliers blue",{"entities" : [(18,22,"COL")]})
]
我的代码是
for _, annotations in Train_data:
for ent in annotations.get("entities"):
ner.add_label(ent[2])
pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
unaffected_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
from spacy.training.example import Example
for batch in spacy.util.minibatch(Train_data, size=2):
for text, annotations in batch:
# create Example
doc = nlp.make_doc(text)
example = Example.from_dict(doc, annotations)
# Update the model
nlp.update([example], losses=losses, drop=0.3)
测试模型时我什么都没得到的。
doc = nlp("Bill Gates has a anthracite house worth 10 EUR.")
print("Entities", [(ent.text, ent.label_) for ent in doc.ents])
我为什么做错了? 请帮忙...
I have descriptions from which I want to extract colours. Hence I thought I would use NER by spacy.
I have data like this for 8000 lines
import spacy
nlp=spacy.load('en_core_web_sm')
# Getting the pipeline component
ner=nlp.get_pipe("ner")
Train_data =
[
("Bermuda shorts anthracite/black",{"entities" : [(15,31,"COL")]}),
("Acrylic antique white",{"entities" : [(8,22,"COL")]}),
("Pincer black",{"entities" : [(8,13,"COL")]}),
("Cable tie black",{"entities" : [(10,15,"COL")]}),
("Water pump pliers blue",{"entities" : [(18,22,"COL")]})
]
My code is
for _, annotations in Train_data:
for ent in annotations.get("entities"):
ner.add_label(ent[2])
pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
unaffected_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
from spacy.training.example import Example
for batch in spacy.util.minibatch(Train_data, size=2):
for text, annotations in batch:
# create Example
doc = nlp.make_doc(text)
example = Example.from_dict(doc, annotations)
# Update the model
nlp.update([example], losses=losses, drop=0.3)
WHen I test the model I get nothing.
doc = nlp("Bill Gates has a anthracite house worth 10 EUR.")
print("Entities", [(ent.text, ent.label_) for ent in doc.ents])
Why am I doing wrong?
Please help...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的代码有几个问题。
您在哪里保存了模型?
您的代码中没有任何东西可以表明您保存并重新加载了模型。当您训练这样的模型时,您不会在磁盘上修改现有模型。如果您在训练后不保存模型,那就消失了,这意味着您没有颜色注释。
您的输入看起来不像您的培训数据!
您的输入是一个完整的句子,但是您的培训数据是孤立的短语。这将导致性能差,因为该模型不确定该如何处理颜色和动词。 (不过,您可能仍然会收到一些注释。)
我强烈建议您通过 spacy课程,,涵盖了培训自己的NER模型。我还强烈建议您使用基于V3配置的培训,而不是编写自己的培训循环。
There are several problems with your code.
Where did you save your model?
There is nothing in your code to indicate you saved and reloaded your model. When you train a model like that, you aren't modifying the existing model on disk. If you don't save the model after training it's just gone, which would mean you get no color annotations.
Your input doesn't look like your training data!
Your input is a complete sentence, but your training data is isolated phrases. This will result in poor performance, as the model isn't sure what to do with colors and, say, verbs. (You would probably still get some annotations though.)
I strongly suggest you go through the spaCy course, which covers training your own NER model. I also strongly recommend you use the v3 config-based training instead of writing your own training loop.
当我尝试运行您的示例时:
运行以下代码时,我会得到:
#output
When I try to run your example:
When I run the below code I get:
#output