如何提高SDB的SPARQL查询性能?
在我的应用程序中,我使用的SPARQL数据库是耶拿的SDB,数据库服务器是DB2。但我发现SPARQL的查询性能非常低。
谁能帮我解决这个问题?如何提高sparql查询性能,特别是SDB的查询性能吗?
下面是我的测试用例数据和SPARQL:
测试用例:
rdf三元组总数为13294。查询结果三元组计数为420。 查询花费了 42 秒。
SPARQL 为:
SELECT DISTINCT ?s ?name ?ownerId ?status ?time
?value ?startTime ?endTime ?description
WHERE
{
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> "http://www.w3c.com/schemas/cp#Event" .
?s <http://www.w3c.com/schemas/cp#time> ?time .
?s <http://www.w3c.com/schemas/cp#ownerId> ?ownerId .
?s <http://www.w3c.com/schemas/cp#name> ?name .
?s <http://www.w3c.com/schemas/cp#value> ?value .
?s <http://www.w3c.com/schemas/cp#_status> ?status .
?s <http://www.w3c.com/schemas/cp#start_Time> ?startTime .
?s <http://www.w3c.com/schemas/cp#end_Time> ?endTime .
?s <http://www.w3c.com/schemas/cp#description> ?description .
FILTER(xsd:dateTime(?time) >= "2011-08-12T00:00:00"^^xsd:dateTime
&& xsd:dateTime(?time) <= "2011-09-18T23:59:59"^^xsd:dateTime)
}
In my application, i used the SPARQL database is SDB of Jena, and the database server is DB2. but i find the query performance of SPARQL is very low.
who can help me to solve this problem? how to improve the sparql query performance,special is the query performance of SDB?
Below is my test case data and the SPARQL:
Test case:
total rdf triple counts are 13294. the query result triple counts are 420.
the query spent 42 seconds.
the SPARQL is:
SELECT DISTINCT ?s ?name ?ownerId ?status ?time
?value ?startTime ?endTime ?description
WHERE
{
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> "http://www.w3c.com/schemas/cp#Event" .
?s <http://www.w3c.com/schemas/cp#time> ?time .
?s <http://www.w3c.com/schemas/cp#ownerId> ?ownerId .
?s <http://www.w3c.com/schemas/cp#name> ?name .
?s <http://www.w3c.com/schemas/cp#value> ?value .
?s <http://www.w3c.com/schemas/cp#_status> ?status .
?s <http://www.w3c.com/schemas/cp#start_Time> ?startTime .
?s <http://www.w3c.com/schemas/cp#end_Time> ?endTime .
?s <http://www.w3c.com/schemas/cp#description> ?description .
FILTER(xsd:dateTime(?time) >= "2011-08-12T00:00:00"^^xsd:dateTime
&& xsd:dateTime(?time) <= "2011-09-18T23:59:59"^^xsd:dateTime)
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
任何 Triplestore(如 SDB)的查询性能总是比本机 Triplestore 差,因为 SQL 支持的 Triplestore(如 SDB)必须将 SPARQL 向下编译为 SQL,这通常会创建极其复杂的 SQL 查询。
因此,以您的示例为例,您要求匹配 9 个三元组模式,这将生成一个包含 9 个
INNER JOIN
操作的 SQL SELECT,这将花费大量时间开始。然后,您将
FILTER
应用于这些三重模式,您遇到的问题是,除非过滤器表达式非常简单或足够接近 SQL,可以将其转换为FILTER
code> 必须在内存中的 Java 代码中进行计算。这在实践中意味着您要选择三元组中所有可能的事件,然后使用 Java 过滤内存中的日期范围,这总是会使您的查询变慢。除非有特定原因您想使用 SDB,否则我真的建议您查看 Jena 的本机三重存储 TDB 或 TDB2。它旨在更有效地执行 SPARQL 查询所需的联接类型,并且它存储数据的方式允许它更快地执行更复杂的过滤器(例如日期范围过滤器)。
The query performance of any Triplestore like SDB is always going to be worse than a native triplestore because an SQL backed triplestore like SDB has to down-compile SPARQL into SQL which often creates horrendously complex SQL queries.
So taking your example you've asked for 9 triple patterns to be matched which will generate an SQL SELECT containing 9
INNER JOIN
operations which will take a lot of time to start with.Then you are applying a
FILTER
to those triple patterns, the problem you have with this is that unless the filter expression is very simple or close enough to SQL to be converted into it aFILTER
has to be evaluated in Java code in memory. What this means in practise is that you are selecting our all the possible events in the triplestore and then filtering for date range in-memory using Java which is always going to make your query slower.Unless there is a specific reason you want to use SDB I'd really suggest looking at Jena's native triple store TDB or TDB2. It is designed to do the types of Joins required by SPARQL queries much more efficiently and the way it stores the data allows it to do more complicated filters like your date range one much much faster.