如何在spaCy中为Doc对象设置扩展属性,以便可以从Doc的切片(Span)中检索它?

发布于 2025-01-18 16:12:53 字数 623 浏览 1 评论 0原文

我想将一个扩展属性添加到一个跨越一个或多个令牌的Spacy Doc,类似于实体属性,以便在查看包含该属性的跨度时也可以访问它。要澄清,在下面,我设置了一个包含doc.ents跨度的列表。然后,如果我只拿一片DOC(包含添加的实体),我仍然可以找到添加的实体。

import spacy

nlp = spacy.load("en_core_web_sm")

doc = nlp("This is some country. Another sentence")

doc.ents = [Span(doc, 2, 4, "GPE")] #doc[2:4] = "some country"
print(doc[1:6].ents)
#[some country]

但是,扩展属性并非如此:

Doc.set_extension('my_extension', default=None)
Span.set_extension('my_extension', default=None)

doc._.my_extension = [Span(doc, 2, 4, "GPE")]
print(doc[1:6]._.my_extension)
#None

我需要与扩展名一起做什么,以便它的行为像实体属性?

I want to add an extension attribute to a spaCy doc that spans one or more tokens, similar to the entity attribute, so that it could also be accessed when looking at a span which contains that attribute. To clarify, below I set a list containing a Span to doc.ents. Then, if I only take a slice of the doc (containing the added entity), I can still find the added entity.

import spacy

nlp = spacy.load("en_core_web_sm")

doc = nlp("This is some country. Another sentence")

doc.ents = [Span(doc, 2, 4, "GPE")] #doc[2:4] = "some country"
print(doc[1:6].ents)
#[some country]

However, that is not the case with an extension attribute:

Doc.set_extension('my_extension', default=None)
Span.set_extension('my_extension', default=None)

doc._.my_extension = [Span(doc, 2, 4, "GPE")]
print(doc[1:6]._.my_extension)
#None

What do I need to do with the extension so that it behaves like the entity property?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

挥剑断情 2025-01-25 16:12:53

您可以在跨度上调用.ents并获得结果的原因是,该值是从令牌上的值重建的。如果您需要针对自定义扩展程序的类似行为,则需要创建一个设置令牌属性的跨度扩展名,并且当被读取时,请使用令牌属性来计算返回值。

The reason you can call .ents on a span and get a result is that the value there is reconstructed from values on the tokens. If you want similar behavior for a custom extension, you'll need to create a span extension that sets token attributes, and when being read uses token attributes to calculate the return value.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文