Jena/ARQ:模型、图和数据集之间的区别

发布于 2024-11-28 13:46:43 字数 584 浏览 1 评论 0原文

我开始使用耶拿引擎,我想我已经掌握了语义是什么。 然而,我很难理解在 Jena 和 ARQ 中表示一堆三元组的不同方法:

  • 开始时你偶然发现的第一件事是 Model,文档中提到了 RDF 图的 Jenas 名称。
  • 然而,当我想查询模型的联合时,还有 Graph 似乎是必要的工具,但它似乎没有与 Model 共享公共接口,尽管我们可以从Model中得到Graph
  • 然后ARQ中有DataSet,它似乎也是某种三元组的集合。

当然,在查看了 API 之后,我找到了以某种方式从一种 API 转换为另一种 API 的方法。然而我怀疑对于同一件事来说,它不仅仅是 3 个不同的接口。

那么,问题是:这三者之间的主要设计差异是什么?我什么时候应该使用哪一个?特别是:当我想保存单独的三元组但将它们作为一大堆(联合)进行查询时,我应该使用这些数据结构中的哪一个(以及为什么)? 另外,当从一种“转换”为另一种时,我是否会“丢失”任何东西(例如,model.getGraph() 在某种程度上包含的信息是否比 model 少)?

I'm starting to work with the Jena Engine and I think I got a grasp of what semantics are.
However I'm having a hard time understanding the different ways to represent a bunch of triples in Jena and ARQ:

  • The first thing you stumble upon when starting is Model and the documentation says its Jenas name for RDF graphs.
  • However there is also Graph which seemed to be the necessary tool when I want to query a union of models, however it does not seem to share a common interface with Model, although one can get the Graph out of a Model
  • Then there is DataSet in ARQ, which also seems to be a collection of triples of some sort.

Sure, afer some looking around in the API, I found ways to somehow convert from one into another. However I suspect there is more to it than 3 different interfaces for the same thing.

So, question is: What are the key design differences between these three? When should I use which one ? Especially: When I want to hold individual bunches of triples but query them as one big bunch (union), which of these datastructures should I use (and why)?
Also, do I "loose" anything when "converting" from one into another (e.g. does model.getGraph() contain less information in some way than model)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

赠我空喜 2024-12-05 13:46:43

Jena 分为 API(供应用程序开发人员使用)和 SPI(供系统开发人员使用,例如制作存储引擎、推理机等的人)。

DataSetModelStatementResourceLiteral 是 API 接口,为应用程序开发人员提供了许多便利。

DataSetGraphGraphTripleNode 是 SPI 接口。它们非常简洁且易于实现(正如您希望的那样,如果您必须实现这些东西)。

各种各样的 API 操作都归结为 SPI 调用。举个例子 模型 接口有四种不同的contains 方法。在内部,每个结果都会导致一个调用:

Graph#contains(Node, Node, Node)

例如

graph.contains(nodeS, nodeP, nodeO); // model.contains(s, p, o) or model.contains(statement)
graph.contains(nodeS, nodeP, Node.ANY); // model.contains(s, p)

关于丢失信息的问题,而 ModelGraph 你没有(据我记得)。更有趣的情况是 ResourceNodeResources 知道它们属于哪个模型,因此您可以(在 api 中)编写 resource.addProperty(...) ,它会成为 Graph#add最终。 Node 没有这样的便利,并且不与特定的Graph 关联。因此,Resource#asNode 是有损的。

最后:

当我想保存单独的三元组但将它们作为一大堆(联合)进行查询时,我应该使用这些数据结构中的哪一个(以及为什么)?

显然您是一名普通用户,因此您需要 API。您想要存储三元组,因此请使用Model。现在您想要将模型作为一个联合进行查询:您可以:

  • Model#union() 所有内容,这会将所有三元组复制到一个新模型中。
  • ModelFactory.createUnion() 一切,这将创建一个动态联合(即不复制)。
  • 将模型作为命名模型存储在 TDB 或 SDB 数据集存储中,并使用 unionDefaultGraph 选项。

最后一个最适合大量模型和大型模型,但设置起来有点复杂。

Jena is divided into an API, for application developers, and an SPI for systems developers, such as people making storage engines, reasoners etc.

DataSet, Model, Statement, Resource and Literal are API interfaces and provide many conveniences for application developers.

DataSetGraph, Graph, Triple, Node are SPI interfaces. They're pretty spartan and simple to implement (as you'd hope if you've got to implement the things).

The wide variety of API operations all resolve down to SPI calls. To give an example the Model interface has four different contains methods. Internally each results in a call:

Graph#contains(Node, Node, Node)

such as

graph.contains(nodeS, nodeP, nodeO); // model.contains(s, p, o) or model.contains(statement)
graph.contains(nodeS, nodeP, Node.ANY); // model.contains(s, p)

Concerning your question about losing information, with Model and Graph you don't (as far as I recall). The more interesting case is Resource versus Node. Resources know which model they belong to, so you can (in the api) write resource.addProperty(...) which becomes a Graph#add eventually. Node has no such convenience, and is not associated with a particular Graph. Hence Resource#asNode is lossy.

Finally:

When I want to hold individual bunches of triples but query them as one big bunch (union), which of these datastructures should I use (and why)?

You're clearly a normal user, so you want the API. You want to store triples, so use Model. Now you want to query the models as one union: You could:

  • Model#union() everything, which will copy all the triples into a new model.
  • ModelFactory.createUnion() everything, which will create a dynamic union (i.e. no copying).
  • Store your models as named models in a TDB or SDB dataset store, and use the unionDefaultGraph option.

The last of these works best for large numbers of models, and large model, but is a little more involved to set up.

南薇 2024-12-05 13:46:43

简短的回答:Model 只是一个无状态包装器,在 Graph 周围有许多方便的方法。 ModelFactory.createModelForGraph(Graph) 将图形包装在模型中。 Model.getGraph() 获取包装图。

大多数应用程序程序员会使用模型。我个人更喜欢使用Graph,因为它更简单。我很难记住 Model 类的所有内容。

Dataset 是多个Model 的集合:一个“默认模型”和零个或多个“命名模型”。这对应于 SPARQL 中“RDF 数据集”的概念。 (从技术上讲,SPARQL 不是“RDF 图”的查询语言,而是“RDF 数据集”的查询语言,“RDF 数据集”可以是命名 RDF 图加上默认图的集合。)

Short answer: Model is just a stateless wrapper with lots of convenience methods around a Graph. ModelFactory.createModelForGraph(Graph) wraps a graph in a model. Model.getGraph() gets the wrapped graph.

Most application programmers would use Model. Personally I prefer to use Graph because it's simpler. I have trouble remembering all the cruft on the Model class.

Dataset is a collection of several Models: one “default model” and zero or more “named models”. This corresponds to the notion of an “RDF dataset” in SPARQL. (Technically speaking, SPARQL is not a query language for “RDF graphs” but for “RDF datasets” which can be collections of named RDF graphs plus a default graph.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文