句子的 RDF 表示
我需要以 RDF 格式表示句子。
换句话说,“约翰喜欢可乐”将自动表示为:
Subject : John
Predicate : Likes
Object : Coke
有人知道我应该从哪里开始吗?是否有任何程序可以自动执行此操作,或者我需要从头开始执行所有操作?
I need to represent sentences in RDF format.
In other words "John likes coke" would be automatically represented as:
Subject : John
Predicate : Likes
Object : Coke
Does anyone know where I should start? Are there any programs which can do this automatically or would I need to do everything from scratch?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
看起来您想要一个句子的类型依赖,例如
John likes coke
:我不知道有任何依赖解析器可以直接生成RDF。然而,其中许多以标准化制表符有限表示形式生成解析,称为 CoNLL-X,从 CoNLL-X 转换到 RDF 应该不会太难。
开源依赖解析器
有许多解析器可供选择来提取类型化依赖关系,包括以下最先进开源选项:
Stanford Parser 包含一个用于解析英语的预训练模型。要获取类型化依赖项,您需要使用标志
-outputFormat typedDependency
。对于 MaltParser,您可以在此处下载英文模型。
MSTParser 包含一个包含 200 个句子的小型英语训练集,您可以使用它来创建自己的英语解析模型。然而,对这些小数据进行训练会损害生成的解析器的准确性。因此,如果您决定使用此解析器,那么最好使用可用的预训练模型 这里。
上面链接的所有预训练模型都会根据斯坦福依赖关系形式生成解析 (ACL论文和手册)。
在这三者中,斯坦福解析器是最准确的。 MaltParser 是最快的,该包的某些配置能够在 8 秒内解析 1800 个句子< /a>.
It looks like you want the typed dependencies of a sentence, e.g. for
John likes coke
:I'm not aware of any dependency parser that directly produces RDF. However, many of them produce parses in a standardized tab limited representation known as CoNLL-X, and it shouldn't be too hard to convert from CoNLL-X to RDF.
Open Source Dependency parsers
There are a number of parsers to choose from that extract typed dependencies, including the following state-of-art open source options:
The Stanford Parser includes a pre-trained model for parsing English. To get typed dependencies you'll need to use the flag
-outputFormat typedDependencies
.For the MaltParser you can download an English model here.
The MSTParser includes a small 200 sentence English training set that you can use to create you're own English parsing model. However, training on this little data will hurt the accuracy of the resulting parser. So, if you decide to use this parser, you are probably better off using the pretrain model available here.
All of the pretrained models linked above produce parses according to the Stanford Dependency formalism (ACL paper, and manual).
Of these three, the Stanford Parser is the most accurate. The MaltParser is the fastest, with some configurations of this package being able to parse 1800 sentences in only 8 seconds.
一种选择是使用 Link Parser 的输出,可用根据 GPL 兼容许可证。您可以根据需要在这些输出和 RDF 节点之间定义一个转换层。
查看此演示关于“约翰喜欢可乐” “ 例子!
One option is to use output from Link Parser, available under a GPL-compatible license. You can define a translation layer between these outputs and your RDF nodes as needed.
Check out this demo on your "John likes coke" example!