为 RDBMS(MySQL 数据库)创建 SPARQL 端点的最佳方法
我正在(想做)一些链接开放数据集的实验,特别是政府推出的实验。
我有一个 RDBMS(更具体地说是 MySQL)。我设计它时考虑了语义网络的想法,即我将信息存储为对象、谓词和定义对象的类。反过来,所有对象通过主语 --> 形式的陈述相互关联。谓词 -->对象(其中主题来自对象表)。
我希望能够从我的应用程序查询其他 RDF 三元组存储,并让其他三元组存储查询我的数据。是否有可能“设置一些东西”以使这成为可能?
我看过耶拿。使用 Jena 似乎意味着我必须将它作为存储应用程序而不是 MySQL - 唯一的问题是我包含了一个称为类别的新概念(我不认为它是语义网络语言的一部分)。我将使用类别来帮助显示信息(它们没有任何其他含义),但使用 Jena 似乎意味着我无法在类别下组织谓词以方便查看。
我使用的是 Java,所以首选 JAVA API。
我也可能误解了耶拿的目的,也许这有用,但我不确定如何用。
我确信四天后这个问题会显得相当愚蠢,但目前我对如何继续感到有些困惑。
I am doing (want to do) some experiments with Linked Open Datasets particularly those put out by governments.
I have a RDBMS (more specifically MySQL). I designed it with semantic web ideas in mind i.e. I have a information stored as objects, predicates and classes which define objects. In turn all objects are related to each other though statements of the form subject --> predicate --> object (where the subjects are from the objects table).
I want to be able to query other RDF triple stores from my application and let other triple stores query my data. Is it possible to "set something up" so that this is possible?
I have looked at Jena. Using Jena seems to mean I have to it as a storage application rather than MySQL - the only problem with this is that I include a new concept called a category (which I don't think is part of the semantic web languages). I will use categories to help with displaying information (they don't have any other meaning) but using Jena seems to mean that I can't organise predicates under categories for more convenient viewing.
I am using Java so a JAVA API is preferred.
It's also possible I misunderstood the purpose of Jena, and maybe that can be of use, but I am not sure how.
I am sure four days from now this question will seem rather silly, but at the moment I am somewhat confused about how to proceed.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不确定你所说的“一个称为类别的新概念”是什么意思,也许你可以举个例子?
如果您的意思是您想要添加额外的元数据,也许作为在用户界面中组织信息的一种方式,则无需扩展语义网络语言或存储系统 - 它们已经可以做您想做的事情。
假设您有来自 英国政府学校数据集 的学校数据(为简洁起见,使用 Turtle 编码):
您可以直接从 SPARQL 端点 查询该数据,也可以下载数据并将其本地存储在您自己的三重存储中。无论哪种方式,您都可以完全自由地添加对用户有用的额外信息。例如:
您可以将此新的三元组存储在与下载的数据相同的图中,也可以将其存储在单独的命名图中,以表明它的信息与源数据具有不同的来源。不管怎样,从 Jena 以编程方式或通过 SPARQL 查询来查询它都很简单。
为高效查询无模式三重中心数据进行布局是一个经过充分研究的难题。大多数 RDF 平台(包括 Jena)都具有经过良好优化的代码,用于根据自己的数据库方案查询和更新三元组。您必须有充分的理由开始自己的关系表布局:)
如果您确实需要采用现有的关系表方案并将其映射到 Jena RDF 模型,请查看 D2RQ。
I'm not sure what you mean by "a new concept called category", perhaps you can give an example?
If you mean that you want to add additional metadata, perhaps as a way of organizing information in the user interface, there is no need to extend the semantic web languages or storage systems - they can already do what you want.
Suppose you have data for a school from the UK Government schools dataset (using Turtle encoding for brevity):
You can directly query that data from the SPARQL end-point, or you can download the data and store it locally in your own triple store. Either way, you're perfectly at liberty to add extra information that's useful to your users. For example:
You can store this new triple in the same graph as the downloaded data, or you can store it in a separate named-graph to indicate that it's information that has a different provenance than the source data. Either way, it's then simple to query it either programmatically from Jena, or via a SPARQL query.
Doing a layout for efficiently querying schemaless triple-centric data is a well-studied, and hard, problem. Most of the RDF platforms, including Jena, have well-optimised code for querying and updating triples from their own database schemes. You would have to have very good reasons for embarking on your own relational table layout :)
If you really do need to take an existing relational table scheme and map it to a Jena RDF model, look at D2RQ.
为什么不直接使用三重存储来存储所有数据?如果您使用具有 SPARQL 端点 功能的三重存储,那么您将拥有SPARQL 可访问的 Web API。同样,网络上的许多其他数据集都作为 SPARQL 端点公开,并可通过 HTTP 访问。
有许多三元组存储可用于数据库和其他方式的持久存储(Jena + SDB、Mulgara、Virtuoso、Oracle 等)。您当然可以通过他们的解析器扩展 Mulgara 以支持针对您的自定义数据库的查询,但我认为这可能是大量工作,但没有太多实际价值。
我确信您可以使用现有的概念来处理 RDF 中的类别概念,或者可能通过在 Jena 上分层来处理。
Why didn't you just use a triple store to store all of your data? If you use a triple store with SPARQL endpoint capability then you would have a SPARQL-accessible web api. Similarly, many other data sets on the web are exposed as SPARQL endpoints and accessible via HTTP.
There are many triple stores available with persistent storage both in a db and otherwise (Jena + SDB, Mulgara, Virtuoso, Oracle, etc). You could certainly extend Mulgara through their resolvers to support queries against your custom db but I think that's probably a lot of work for not too much real value.
I'm sure you could use existing concepts to handle your notion of categories in RDF or perhaps by layering something over Jena.