我很难理解 w3.org 中的一段话。令人困惑的段落可能是一个错误,或者我可能只是感到困惑。
以下是 RDF 概念规范的 第 6.6 节,
6.6 空白节点
RDF 图中的空白节点是从无限集合中提取的。这组空白节点、所有 RDF URI 引用的集合和所有文字的集合是成对不相交的。
否则,这组空白节点是任意的。
RDF 不引用空白节点的任何内部结构。给定两个空白节点,可以判断它们是否相同。
所以,我困惑的是:如果无法知道“空白音符的内部结构”,如何区分它们呢?这是拼写错误吗?
I am having difficulty understanding a passage from w3.org. The confusing passage may be an error, or I may just be confused.
The following is Section 6.6 of the RDF Concepts Specification,
6.6 Blank Nodes
The blank nodes in an RDF graph are drawn from an infinite set. This set of blank nodes, the set of all RDF URI references and the set of all literals are pairwise disjoint.
Otherwise, this set of blank nodes is arbitrary.
RDF makes no reference to any internal structure of blank nodes. Given two blank nodes, it is possible to determine whether or not they are the same.
So, the thing I'm confused about is: If there is no way to know the "internal structure of blank notes", how can one tell them apart? Is this a typo?
发布评论
评论(3)
这不是一个错字,我同意,这并不容易理解。这也是一个反复出现的问题。存在空白节点是因为有时无法创建 URI 来表示节点。例如,在 OWL 中构造约束时,这种情况经常发生。
通常,在解析 RDF 文件时会创建一个空白节点 ID,并且它必须是唯一的。因此,根据定义,您不应该找到两个具有相同标识符的空白节点。区分两个空白节点的一种方法是查看所有传入/传出谓词及其宾语/主语,以查看连接的子图是否相同。这很难实现,并且计算大型图可能非常昂贵。
这个问题与寻找 RDF 图之间的差异有关,已被广泛讨论。一篇非常有趣的文章是 TimBL 的设计问题之一 Delta:RDF 图之间差异分布的本体。另请参阅如何区分 RDF 图 wiki 与 w3c。
如果您是数据发布者,请尽可能避免使用空白节点。如果您需要空白节点,请尝试提出一个哈希函数,该函数为不同的空白节点结构提供唯一的 ID,这样具有相同图形结构的两个不同的空白节点将具有相同的 ID,因此您可以将它们放在一起公寓。
It is not a typo and I agree, it is not straight forward to understand. This is a also recurrent issue. Blank nodes exist because sometimes there aren't ways to create an URI to represent a node. This case happens all the time in OWL when constructing constrains, for example.
A blank node ID is created, normally, when the RDF file is parsed and it must be unique. So by definition you shouldn't find two blank node with same identifiers. One way of distinguish between two blank nodes is to look at all the incoming/out-coming predicates plus their objects/subjects in order to see if the connected sub-graphs are identical. This is hard to implement and it could be very expensive to compute for large graphs.
This problem has been widely discussed in connection with finding differences between RDF graphs. One very interesting article is one of the TimBL's design issues Delta: an ontology for the distribution of differences between RDF graphs. Also have a look at How to diff RDF graphs wiki from the w3c.
If you are the data publisher then try to avoid blank nodes if posible. If you need blank nodes then try to come up with a hash function that gives you a unique ID for different blank node constructions in such a way that two different blank nodes with the same graph structure will have the same ID and therefore you can put them appart.
请注意,2014 年 2 月标准化的 RDF 1.1 对以下文本进行了轻微编辑:
并添加有关空白节点标识符的注释:
还有一个新的规范推荐用于空白节点管理的 skolemization 方案< /a>.
无论如何,你说:
但这不是规范所说的。该规范只是说它没有定义这种方式,这意味着实现者有责任决定他们想要如何在内部表示和识别空白节点。但我同意 2004 年规范的措辞令人困惑。
Note that RDF 1.1, standardised in February 2014, slightly edit this text:
and adds a note about blank node identifiers:
There is also a new piece of spec that recommends a skolemisation scheme for blank node management.
In any case, you say that:
but this is not what the spec says. The spec simply says that it does not define such a way, which means that it is the responsibility of the implementers to decide how they want to internally represent and identify blank nodes. But I agree that the wording of the 2004 spec is confusing.
W3C 社区组报告草案中讨论了一种算法:
-- https://json-ld.github.io/normalization/spec/
There is an algorithm discussed in this draft W3C Community Group report:
-- https://json-ld.github.io/normalization/spec/