句子的 RDF 表示

发布于 2024-08-30 07:18:37 字数 181 浏览 5 评论 0原文

我需要以 RDF 格式表示句子。

换句话说，“约翰喜欢可乐”将自动表示为：

Subject : John
Predicate : Likes
Object : Coke

有人知道我应该从哪里开始吗？是否有任何程序可以自动执行此操作，或者我需要从头开始执行所有操作？

原文

I need to represent sentences in RDF format.

In other words "John likes coke" would be automatically represented as:

Subject : John
Predicate : Likes
Object : Coke

Does anyone know where I should start? Are there any programs which can do this automatically or would I need to do everything from scratch?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ゝ杯具 2024-09-06 07:18:37

看起来您想要一个句子的类型依赖，例如John likes coke：

 nsubj(likes-2, John-1)
 dobj(likes-2, coke-3)

我不知道有任何依赖解析器可以直接生成RDF。然而，其中许多以标准化制表符有限表示形式生成解析，称为 CoNLL-X，从 CoNLL-X 转换到 RDF 应该不会太难。

开源依赖解析器

有许多解析器可供选择来提取类型化依赖关系，包括以下最先进开源选项：

Stanford Parser 包含一个用于解析英语的预训练模型。要获取类型化依赖项，您需要使用标志 -outputFormat typedDependency。

对于 MaltParser，您可以在此处下载英文模型。

MSTParser 包含一个包含 200 个句子的小型英语训练集，您可以使用它来创建自己的英语解析模型。然而，对这些小数据进行训练会损害生成的解析器的准确性。因此，如果您决定使用此解析器，那么最好使用可用的预训练模型这里。

上面链接的所有预训练模型都会根据斯坦福依赖关系形式生成解析 (ACL论文和手册）。

在这三者中，斯坦福解析器是最准确的。 MaltParser 是最快的，该包的某些配置能够在 8 秒内解析 1800 个句子< /a>.

It looks like you want the typed dependencies of a sentence, e.g. for John likes coke:

 nsubj(likes-2, John-1)
 dobj(likes-2, coke-3)

I'm not aware of any dependency parser that directly produces RDF. However, many of them produce parses in a standardized tab limited representation known as CoNLL-X, and it shouldn't be too hard to convert from CoNLL-X to RDF.

Open Source Dependency parsers

There are a number of parsers to choose from that extract typed dependencies, including the following state-of-art open source options:

The Stanford Parser includes a pre-trained model for parsing English. To get typed dependencies you'll need to use the flag -outputFormat typedDependencies.

For the MaltParser you can download an English model here.

The MSTParser includes a small 200 sentence English training set that you can use to create you're own English parsing model. However, training on this little data will hurt the accuracy of the resulting parser. So, if you decide to use this parser, you are probably better off using the pretrain model available here.

All of the pretrained models linked above produce parses according to the Stanford Dependency formalism (ACL paper, and manual).

Of these three, the Stanford Parser is the most accurate. The MaltParser is the fastest, with some configurations of this package being able to parse 1800 sentences in only 8 seconds.

回复收藏 0 原文