如何区分RDF中的两个空白节点?

发布于 2024-11-19 11:05:33 字数 449 浏览 8 评论 0 原文

我很难理解 w3.org 中的一段话。令人困惑的段落可能是一个错误,或者我可能只是感到困惑。

以下是 RDF 概念规范的 第 6.6 节,

6.6 空白节点

RDF 图中的空白节点是从无限集合中提取的。这组空白节点、所有 RDF URI 引用的集合和所有文字的集合是成对不相交的。

否则,这组空白节点是任意的。

RDF 不引用空白节点的任何内部结构。给定两个空白节点,可以判断它们是否相同。

所以,我困惑的是:如果无法知道“空白音符的内部结构”,如何区分它们呢?这是拼写错误吗?

I am having difficulty understanding a passage from w3.org. The confusing passage may be an error, or I may just be confused.

The following is Section 6.6 of the RDF Concepts Specification,

6.6 Blank Nodes

The blank nodes in an RDF graph are drawn from an infinite set. This set of blank nodes, the set of all RDF URI references and the set of all literals are pairwise disjoint.

Otherwise, this set of blank nodes is arbitrary.

RDF makes no reference to any internal structure of blank nodes. Given two blank nodes, it is possible to determine whether or not they are the same.

So, the thing I'm confused about is: If there is no way to know the "internal structure of blank notes", how can one tell them apart? Is this a typo?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

夜血缘 2024-11-26 11:05:33

这不是一个错字,我同意,这并不容易理解。这也是一个反复出现的问题。存在空白节点是因为有时无法创建 URI 来表示节点。例如,在 OWL 中构造约束时,这种情况经常发生。

通常,在解析 RDF 文件时会创建一个空白节点 ID,并且它必须是唯一的。因此,根据定义,您不应该找到两个具有相同标识符的空白节点。区分两个空白节点的一种方法是查看所有传入/传出谓词及其宾语/主语,以查看连接的子图是否相同。这很难实现,并且计算大型图可能非常昂贵。

这个问题与寻找 RDF 图之间的差异有关,已被广泛讨论。一篇非常有趣的文章是 TimBL 的设计问题之一 Delta:RDF 图之间差异分布的本体。另请参阅如何区分 RDF 图 wiki 与 w3c

如果您是数据发布者,请尽可能避免使用空白节点。如果您需要空白节点,请尝试提出一个哈希函数,该函数为不同的空白节点结构提供唯一的 ID,这样具有相同图形结构的两个不同的空白节点将具有相同的 ID,因此您可以将它们放在一起公寓。

It is not a typo and I agree, it is not straight forward to understand. This is a also recurrent issue. Blank nodes exist because sometimes there aren't ways to create an URI to represent a node. This case happens all the time in OWL when constructing constrains, for example.

A blank node ID is created, normally, when the RDF file is parsed and it must be unique. So by definition you shouldn't find two blank node with same identifiers. One way of distinguish between two blank nodes is to look at all the incoming/out-coming predicates plus their objects/subjects in order to see if the connected sub-graphs are identical. This is hard to implement and it could be very expensive to compute for large graphs.

This problem has been widely discussed in connection with finding differences between RDF graphs. One very interesting article is one of the TimBL's design issues Delta: an ontology for the distribution of differences between RDF graphs. Also have a look at How to diff RDF graphs wiki from the w3c.

If you are the data publisher then try to avoid blank nodes if posible. If you need blank nodes then try to come up with a hash function that gives you a unique ID for different blank node constructions in such a way that two different blank nodes with the same graph structure will have the same ID and therefore you can put them appart.

酒绊 2024-11-26 11:05:33

请注意,2014 年 2 月标准化的 RDF 1.1 对以下文本进行了轻微编辑:

空白节点与 IRI 和文字不相交。否则,可能的空白节点的集合是任意的。 RDF 不引用空白节点的任何内部结构。

并添加有关空白节点标识符的注释:

注意:
空白节点标识符是在某些具体 RDF 语法或 RDF 存储实现中使用的本地标识符。它们始终局限于文件或 RDF 存储的本地范围,并且不是空白节点的持久或可移植标识符。空白节点标识符不是 RDF 抽象语法的一部分,而是完全依赖于具体语法或实现。因此,对空白节点标识符的语法限制(如果有的话)也取决于具体的 RDF 语法或实现。在具体语法中处理空白节点标识符的实现需要小心,不要从多次出现的相同空白节点标识符中创建相同的空白节点,除非语法支持这种情况。

还有一个新的规范推荐用于空白节点管理的 skolemization 方案< /a>.

无论如何,你说:

无法知道“空白节点的内部结构”

但这不是规范所说的。该规范只是说它没有定义这种方式,这意味着实现者有责任决定他们想要如何在内部表示和识别空白节点。但我同意 2004 年规范的措辞令人困惑。

Note that RDF 1.1, standardised in February 2014, slightly edit this text:

Blank nodes are disjoint from IRIs and literals. Otherwise, the set of possible blank nodes is arbitrary. RDF makes no reference to any internal structure of blank nodes.

and adds a note about blank node identifiers:

Note:
Blank node identifiers are local identifiers that are used in some concrete RDF syntaxes or RDF store implementations. They are always locally scoped to the file or RDF store, and are not persistent or portable identifiers for blank nodes. Blank node identifiers are not part of the RDF abstract syntax, but are entirely dependent on the concrete syntax or implementation. The syntactic restrictions on blank node identifiers, if any, therefore also depend on the concrete RDF syntax or implementation. Implementations that handle blank node identifiers in concrete syntaxes need to be careful not to create the same blank node from multiple occurrences of the same blank node identifier except in situations where this is supported by the syntax.

There is also a new piece of spec that recommends a skolemisation scheme for blank node management.

In any case, you say that:

there is no way to know the "internal structure of blank nodes"

but this is not what the spec says. The spec simply says that it does not define such a way, which means that it is the responsibility of the implementers to decide how they want to internally represent and identify blank nodes. But I agree that the wording of the 2004 spec is confusing.

俏︾媚 2024-11-26 11:05:33

W3C 社区组报告草案中讨论了一种算法:

RDF 数据集标准化

标准 RDF 数据集标准化算法

...

本文档概述了生成标准化 RDF 数据集<的算法/a> 给定一个 RDF 数据集 作为输入。该算法称为通用 RDF 数据集标准化算法 2015URDNA2015

...

该规范定义了一种用于创建稳定空白节点的算法标识符可重复用于不同的序列化,可能使用个性化的通过接地每个 ="nofollow noreferrer">空白节点通过它所连接的节点,本质上创建了Skolem 空白节点标识符。因此,可以通过对结果 标准化数据集,允许同构和数字签名用例。由于即使对图(数据集)进行其他更改,空白节点标识符也可以保持稳定,因此在某些情况下,可以计算两个图(数据集)之间的差异,例如,如果仅对基本三元组进行更改,或者如果新的空白引入的节点不会与其他现有的空白节点产生自同混淆。

-- https://json-ld.github.io/normalization/spec/

There is an algorithm discussed in this draft W3C Community Group report:

RDF Dataset Normalization

A Standard RDF Dataset Normalization Algorithm

...

This document outlines an algorithm for generating a normalized RDF dataset given an RDF dataset as input. The algorithm is called the Universal RDF Dataset Normalization Algorithm 2015 or URDNA2015.

...

This specification defines an algorithm for creating stable blank node identifiers repeatably for different serializations possibly using individualized blank node identifiers of the same RDF graph (dataset) by grounding each blank node through the nodes to which it is connected, essentially creating Skolem blank node identifiers. As a result, a graph signature can be obtained by hashing a canonical serialization of the resulting normalized dataset, allowing for the isomorphism and digital signing use cases. As blank node identifiers can be stable even with other changes to a graph (dataset), in some cases it is possible to compute the difference between two graphs (datasets), for example if changes are made only to ground triples, or if new blank nodes are introduced which do not create an automorphic confusion with other existing blank nodes.

-- https://json-ld.github.io/normalization/spec/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文