RDF中的重复三元组​​,权威观点?

发布于 2024-08-14 11:48:51 字数 2297 浏览 7 评论 0原文

如果一个三元组存储包含两次相同的三元组,那么关于这种冗余的权威立场是什么(如果存在)?

此外,是否应该允许三元组存储在同一上下文中存储两次相同的三元组?

我问这个问题是因为在 rdflib 中显然你可以存储相同的三元组两次(或更多)。这是读者

import rdflib
from rdflib import store

s = rdflib.plugin.get('MySQL', store.Store)('rdfstore')

config_string = "host=localhost,password=foo,user=foo,db=foo"
rt = s.open(config_string,create=False)
if rt != store.VALID_STORE:
    s.open(config_string,create=True)

graph = rdflib.ConjunctiveGraph(s, identifier = rdflib.URIRef("urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52"))
rows = graph.query("SELECT ?id ?value { ?id <http://localhost#ha> ?value . }")
for r in rows:
    print r[0], r[1]

,这是作者

import rdflib
from rdflib import store

s = rdflib.plugin.get('MySQL', store.Store)('rdfstore')

config_string = "host=localhost,password=foo,user=foo,db=foo"
rt = s.open(config_string,create=False)
if rt != store.VALID_STORE:
    s.open(config_string,create=True)

graph = rdflib.ConjunctiveGraph(s, identifier = rdflib.URIRef("urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52"))
graph.add( ( rdflib.URIRef("http://localhost/1000"), rdflib.URIRef("http://localhost#ha"), rdflib.Literal("18")) )
graph.commit()

这是我得到的

sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
table kb_7b066eca61_relations Doesn't exist
table kb_7b066eca61_relations Doesn't exist
sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
sbo@dhcp-045:~/tmp/gd $ python ./writer2.py 
sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
http://localhost/1000 18
sbo@dhcp-045:~/tmp/gd $ python ./writer2.py 
sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
http://localhost/1000 18
http://localhost/1000 18

对我来说,它似乎是一个错误。修改后的版本显示两个三元组属于同一上下文,并且确实也有两个三元组

len : 2
http://localhost/1000 18
http://localhost/1000 18
(rdflib.URIRef('http://localhost/1000'), rdflib.URIRef('http://localhost#ha'), rdflib.Literal(u'18'), <Graph identifier=urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52 (<class 'rdflib.Graph.Graph'>)>)
(rdflib.URIRef('http://localhost/1000'), rdflib.URIRef('http://localhost#ha'), rdflib.Literal(u'18'), <Graph identifier=urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52 (<class 'rdflib.Graph.Graph'>)>)

if a triple store contains twice the same triple, what is (if any exist) the authoritative position about this redundancy ?

Additionally, should a triplestore be allowed to store twice the same triple within the same context ?

I ask this because in rdflib apparently you can store the same triple twice (or more). This is the reader

import rdflib
from rdflib import store

s = rdflib.plugin.get('MySQL', store.Store)('rdfstore')

config_string = "host=localhost,password=foo,user=foo,db=foo"
rt = s.open(config_string,create=False)
if rt != store.VALID_STORE:
    s.open(config_string,create=True)

graph = rdflib.ConjunctiveGraph(s, identifier = rdflib.URIRef("urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52"))
rows = graph.query("SELECT ?id ?value { ?id <http://localhost#ha> ?value . }")
for r in rows:
    print r[0], r[1]

and this is the writer

import rdflib
from rdflib import store

s = rdflib.plugin.get('MySQL', store.Store)('rdfstore')

config_string = "host=localhost,password=foo,user=foo,db=foo"
rt = s.open(config_string,create=False)
if rt != store.VALID_STORE:
    s.open(config_string,create=True)

graph = rdflib.ConjunctiveGraph(s, identifier = rdflib.URIRef("urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52"))
graph.add( ( rdflib.URIRef("http://localhost/1000"), rdflib.URIRef("http://localhost#ha"), rdflib.Literal("18")) )
graph.commit()

This is what I obtain

sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
table kb_7b066eca61_relations Doesn't exist
table kb_7b066eca61_relations Doesn't exist
sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
sbo@dhcp-045:~/tmp/gd $ python ./writer2.py 
sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
http://localhost/1000 18
sbo@dhcp-045:~/tmp/gd $ python ./writer2.py 
sbo@dhcp-045:~/tmp/gd $ python ./reader2.py 
http://localhost/1000 18
http://localhost/1000 18

To me it appears as a bug. A modified version shows me that both triples belong to the same context, and there are indeed two triples as well

len : 2
http://localhost/1000 18
http://localhost/1000 18
(rdflib.URIRef('http://localhost/1000'), rdflib.URIRef('http://localhost#ha'), rdflib.Literal(u'18'), <Graph identifier=urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52 (<class 'rdflib.Graph.Graph'>)>)
(rdflib.URIRef('http://localhost/1000'), rdflib.URIRef('http://localhost#ha'), rdflib.Literal(u'18'), <Graph identifier=urn:uuid:a19f9b78-cc43-4866-b9a1-4b009fe91f52 (<class 'rdflib.Graph.Graph'>)>)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

静水深流 2024-08-21 11:48:51

RDF 三元组存储是一组三元组,因此根据定义,相同的三元组不能出现两次。然而,大多数 rdf 存储实际上是四元存储(rdf 图集也称为数据集),在这种情况下,三元组可能会出现多次。有时这被称为上下文,具体取决于商店(例如我的,Redland)。权限实际上取决于用户来定义特定图形名称/上下文名称的含义。

An RDF triple store is a set of triples, so the same triple cannot be present twice, by definition. However, most rdf stores are actually quad stores (sets of rdf graphs also known as datasets) and in that case, the triple may appear multiple times. That is sometimes called context, depending on the store (eg mine, Redland). Authority is really up to the user to define what meaning a particular graph name/context name has.

如果没有你 2024-08-21 11:48:51

人们应该记住,任何特定的三元组可能具有与其他(否则相同)三元组不同的元数据。元数据,例如三元组的原始来源、连接信息的可能强度等。仅仅计算三元组的副本数量来判断连接与其他可能的矛盾连接相比的相对强度也是可行的。因此,一如既往,这完全取决于您打算如何处理数据。

One should keep in mind that any particular triple may have different metadata than other - otherwise identical - triples. Metadata such as the original source of the triple, possible strength of connection information, etcetera. It may also be feasible to merely count the number of copies of a triple in order to judge the relative strength of a connection compared to other possible contradictory connections. So, as always, it all depends upon what you intend to do with your data.

莳間冲淡了誓言ζ 2024-08-21 11:48:51

RDF 是一种用于表达事实主张、组织并分组为图形的语言。如果一个图表包含两次“爱丽丝是一个人”,那就是多余的。因此,在图表中,三元组被归一化;重复它们是没有意义的。然而,应用程序、存储和 SPARQL 可查询系统通常会从不同来源收集事实声明。当您想要采用多图视角并在不同来源中查找相同的三元组时,SPARQL 语言具有“GRAPH”关键字。

RDF is a language for expressing factual claims, organized and grouped into graphs. If a graph contains "Alice is a Person" twice, that's just redundant. So within a graph, triples are normalised; there's no point in repeating them. However applications, stores and SPARQL-queriable systems will often collect factual claims from different sources. The SPARQL language has the 'GRAPH' keyword for when you want to take a multi-graph perspective and look for the same triple in different sources.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文