命名图和联合 SPARQL 端点
我最近遇到了 SPARQL 1.1 联合扩展<的工作草案< /a> 并想知道使用命名图是否已经可以实现这一点(不要减损上述草案的有用性)。
我对命名图的理解有点模糊,除了我从阅读规范中看到的唯一的东西包括关于查询时与其他图相关的合并、非合并的规则。由于这并不完全满足我的理解,我的问题如下:
给定以下查询:
SELECT ?something
FROM NAMED <http://www.vw.co.uk/models/used>
FROM NAMED <http://www.autotrader.co.uk/cars/used>
WHERE {
...
}
假设查询处理器/端点可以或应该在命名图的上下文中执行以下操作是否合理:
检查命名图是否存在于本地
如果不存在,则执行以下操作(在上述查询的情况下,我将使用第二个命名图)
GET /sparql/?query=EncodedQuery HTTP/1.1 主办方:www.autotrader.co.uk User-agent: my-sparql-client/0.1
其中 EncodedQuery 仅包含 FROM NAMED
子句中的第二个命名图,并且 WHERE
子句针对 < code>GRAPH 子句(例如,如果使用 GRAPH
)。
仅当无法执行上述操作时,才执行以下任一操作:
GET /cars/used HTTP/1.1
Host: www.autotrader.co.uk
或
LOAD <http://www.autotrader.co.uk/cars/used>
- 返回适当的搜索结果。
显然,围绕 OFFSET
和 LIMIT
可能还有一些额外的考虑因素,
我还记得很久以前在遥远的星系中的某个地方读到过,任何SPARQL 端点应该是根据以下约定的命名图:
对于:http://www.vw。 co.uk/sparql/ 应该有一个命名图:http://www.vw。 co.uk 表示默认图,因此根据上述逻辑,应该已经可以使用命名图来联合 SPARQL 端点。
我问的原因是,我想开始在上面的示例中促进跨域的联合,而不必等待标准,确保我不会做一些不平衡或与域中其他内容不兼容的事情。未来。
I recently came across the working draft for SPARQL 1.1 Federation Extensions and wondered whether this was already possible using Named Graphs (not to detract from the usefulness of the aforementioned draft).
My understanding of Named Graphs is a little hazy, save that the only thing I have gleamed from reading the specs comprises rules around the merger, non merger in relation to other graphs at query time. Since this doesn't fully satisfy my understanding, my question is as follows:
Given the following query:
SELECT ?something
FROM NAMED <http://www.vw.co.uk/models/used>
FROM NAMED <http://www.autotrader.co.uk/cars/used>
WHERE {
...
}
Is it reasonable to assume that a query processor/endpoint could or should in the context of the named graphs do the following:
Check is the named graph exists locally
If it doesn't then perform the following operation (in the case of the above query, I will use the second named graph)
GET /sparql/?query=EncodedQuery HTTP/1.1
Host: www.autotrader.co.uk
User-agent: my-sparql-client/0.1
Where the EncodedQuery only includes the second named graph in the FROM NAMED
clause and the WHERE
clause is amended accordingly with respect to GRAPH
clauses (e.g if a GRAPH <http://www.vw.co.uk/models/used> {...}
is being used).
Only if it can't perform the above, then do any of the following:
GET /cars/used HTTP/1.1
Host: www.autotrader.co.uk
or
LOAD <http://www.autotrader.co.uk/cars/used>
- Return appropriate search results.
Obviously there might be some additional considerations around OFFSET
's and LIMIT
's
I also remember reading somewhere a long time ago in galaxy far far away, that the default graph of any SPARQL endpoint should be a named graph according to the following convention:
For: http://www.vw.co.uk/sparql/ there should be a named graph of: http://www.vw.co.uk that represents the default graph and so by the above logic, it should already be possible to federate SPARQL endpoints using named graphs.
The reason I ask is that I want to start promoting federation across the domains in the above example, without having to wait around for the standard, making sure that I won't do something that is out of kilter or incompatible with something else in the future.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
联合查询(使用 SERVICE 或 FROM)中使用的命名图和 URL 是两个不同的东西。后者指向SPARQL端点,命名图位于三元组存储内,主要功能是分离不同的数据集。反过来,这对于提高性能和表示知识很有用,例如表示一组语句的来源是什么。
例如,您可能有两个数据源都声明
?movie has- rating ?x
并且您可能想知道哪个源声明了哪个评级,在这种情况下,您可以使用与两个来源(例如,http://www.example.com/rotten-tomatoes
和http://www.example.com/imdb
)。如果您将两个数据集存储在同一个三元组存储中,您可能会想要使用 NG,而远程端点是另一回事。此外,命名图的 URL 可以与 VoID 等词汇一起使用来描述数据集作为一个整体(例如,数据集名称、三元组导入的位置和时间、维护者是谁、用户许可证)。这是将三重存储划分为 NG 的另一个原因。也就是说,将 NG 绑定到端点 URL 的机制可能会作为一个选项来实现,但我认为将其强制执行并不是一个好主意,因为单独管理远程端点 URL 和 NG 可能更有用。
此外,联合查询的真正挑战是提供端点透明的查询,使查询引擎足够智能,能够分析查询并了解如何拆分查询并在正确的端点上执行部分查询(并稍后以高效的方式连接结果)。方式)。对此进行了大量研究,最重要的结果之一(据我所知)是 FedX,已用于实现多项查询分布优化 (示例)。
最后要补充的是,我依稀记得你提到的关于 $url、$url/sparql 的约定。有几种方法(例如,LOD 云)。也就是说,在当今大多数三重存储(例如 Virtuoso)中,不指定命名图(不使用 GRAPH)的查询的工作方式与陷入默认图情况不同,它们实际上查询所有的并集商店中的命名图,这通常更有用(当您不知道在哪里说明某些内容时,或者您想要集成跨图数据时)。
Named graph and URLs used in federated queries (using SERVICE or FROM) are two different things. The latter point to SPARQL endpoints, the named graphs are within a triple store and have the main function of separating different data sets. This, in turn, can be useful to both improve performance and represent knowledge, such as representing what is the source of a set of statements.
For instance, you might have two data sources both stating that
?movie has-rating ?x
and you might want to know which source is stating which rating, in this case you can use two named graphs associated to the two sources (e.g.,http://www.example.com/rotten-tomatoes
andhttp://www.example.com/imdb
). If you're storing both data sets in the same triple store, probably you will want to use NGs, and remote endpoints are a different thing. Furthermore, the URL of a named graph can be used with vocabularies like VoID to describe a dataset as a whole (eg, the data set name, where and when the triples are imported from, who is the maintainer, user licence). This is another reason to partition your triple store into NGs.That said, your mechanism to bind NGs to endpoint URLs might be implemented as an option, but I don't think it's a good idea to have it as mandatory, since managing remote endpoint URLs and NGs separately can be more useful.
Moreover, the real challenge in federated queries is to offer endpoint-transparent queries, making the query engine smart enough to analyse the query and understand how to split it and perform partial queries on the right endpoints (and join the results later, in an efficient way). There is a lot of research being done on that, one of the most significant results (as far as I know) is FedX, which has been used to implement several query distribution optimisations (example).
Last thing to add, I vaguely remember the convention that you mention about $url, $url/sparql. There are a couple of approaches around (e.g., LOD cloud). That said, in most nowadays triple stores (e.g., Virtuoso), queries that don't specify a named graph (don't use GRAPH) work in a way different than falling into a default graph case, they actually query the union of all named graphs in the store, which is usually much more useful (when you don't know where something is stated, or you want to integrate cross-graph data).