如何在spaCy中为Doc对象设置扩展属性,以便可以从Doc的切片(Span)中检索它?
我想将一个扩展属性添加到一个跨越一个或多个令牌的Spacy Doc,类似于实体属性,以便在查看包含该属性的跨度时也可以访问它。要澄清,在下面,我设置了一个包含doc.ents跨度的列表。然后,如果我只拿一片DOC(包含添加的实体),我仍然可以找到添加的实体。
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("This is some country. Another sentence")
doc.ents = [Span(doc, 2, 4, "GPE")] #doc[2:4] = "some country"
print(doc[1:6].ents)
#[some country]
但是,扩展属性并非如此:
Doc.set_extension('my_extension', default=None)
Span.set_extension('my_extension', default=None)
doc._.my_extension = [Span(doc, 2, 4, "GPE")]
print(doc[1:6]._.my_extension)
#None
我需要与扩展名一起做什么,以便它的行为像实体属性?
I want to add an extension attribute to a spaCy doc that spans one or more tokens, similar to the entity attribute, so that it could also be accessed when looking at a span which contains that attribute. To clarify, below I set a list containing a Span to doc.ents. Then, if I only take a slice of the doc (containing the added entity), I can still find the added entity.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("This is some country. Another sentence")
doc.ents = [Span(doc, 2, 4, "GPE")] #doc[2:4] = "some country"
print(doc[1:6].ents)
#[some country]
However, that is not the case with an extension attribute:
Doc.set_extension('my_extension', default=None)
Span.set_extension('my_extension', default=None)
doc._.my_extension = [Span(doc, 2, 4, "GPE")]
print(doc[1:6]._.my_extension)
#None
What do I need to do with the extension so that it behaves like the entity property?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以在跨度上调用
.ents
并获得结果的原因是,该值是从令牌上的值重建的。如果您需要针对自定义扩展程序的类似行为,则需要创建一个设置令牌属性的跨度扩展名,并且当被读取时,请使用令牌属性来计算返回值。The reason you can call
.ents
on a span and get a result is that the value there is reconstructed from values on the tokens. If you want similar behavior for a custom extension, you'll need to create a span extension that sets token attributes, and when being read uses token attributes to calculate the return value.