如何将自定义属性包含在Spacy的doc.from_docs函数中?
Spacy3中有一个方法doc.from_docs()
(以下是代码和高级文档)这将为我正在进行的项目派上用场。此方法将DOC对象的列表串为单个DOC对象。
在高级别上,这是该方法的作用:
- 通过
doc.to_array()
- condenate所得数组
- 创建单个DOC对象,从numpy阵列中创建单个doc对象,从而在步骤(2)中将 所有输入doc对象转换为numpy数组。通过
doc.from_array()
我想找出如何使此from_docs
方法考虑使用自定义属性。当前,考虑了本机spacy属性(例如“ pos”或“ dep”),并将任何相关标签从原始输入文档对象传输到生成的串联doc对象。但是,执行此方法时,任何自定义属性扩展(即doc ._。*
)都会丢失。
有人知道如何在doc.from_docs()
方法中包含自定义属性吗?
谢谢您的任何提示。
There is a method Doc.from_docs()
in spaCy3 (here are links to the code and the high-level documentation) which would come in handy for a project I'm working on. This method concatenates a list of Doc objects into a single Doc object.
On a high level, here's what the method does:
- convert all input Doc objects to numpy arrays via
Doc.to_array()
- concatenate the resulting arrays
- create single Doc object from the numpy array resulting in step (2) via
Doc.from_array()
I would like to find out how to make this from_docs
method take into account also custom attributes. Currently, native spaCy attributes like e.g. "POS" or "DEP" are considered and any related tags are transferred from the original input Doc objects to the resulting concatenated Doc object. However, any custom attribute extensions (i.e. Doc._.*
) are lost when executing this method.
Does anyone know how to include custom attributes in the Doc.from_docs()
method?
Thank you for any hints.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
doc.from_docs
仅包括令牌
和span
扩展。(最终合并的文档应包括
doc1._。Ext= true
+doc2._。_。ext = false
?Doc.from_docs
only includesToken
andSpan
extensions.(What value should the final merged doc include for
doc1._.ext = True
+doc2._.ext = False
?)