SPARQL 查询的优化。 [预计执行时间超过1500(秒)的限制]
我正在尝试在 http://dbpedia.org/sparql 上运行此查询,但我的查询出现错误太贵了。 http://dbpedia.org/snorql/ 运行查询时,我得到:
The estimated execution time 25012730 (sec) exceeds the limit of 1500 (sec) ...
当我通过 我的 python 脚本使用 SPARQLWrapper 我只是得到一个 HTTP 500。
我想我需要做一些事情来优化我的 SPARQL 查询。我需要迭代教育机构的数据并将其导入本地数据库,也许我错误地使用了 SPARQL,应该以根本不同的方式执行此操作。
希望有人能帮助我!
查询
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?uri
?name
?homepage
?student_count
?native_name
?city
?country
?type
?lat ?long
?image
WHERE {
?uri rdf:type dbpedia-owl:EducationalInstitution .
?uri foaf:name ?name .
OPTIONAL { ?uri foaf:homepage ?homepage } .
OPTIONAL { ?uri dbpedia-owl:numberOfStudents ?student_count } .
OPTIONAL { ?uri dbpprop:nativeName ?native_name } .
OPTIONAL { ?uri dbpprop:city ?city } .
OPTIONAL { ?uri dbpprop:country ?country } .
OPTIONAL { ?uri dbpprop:type ?type } .
OPTIONAL { ?uri geo:lat ?lat . ?uri geo:long ?long } .
OPTIONAL { ?uri foaf:depiction ?image } .
}
ORDER BY ?uri
LIMIT 20 OFFSET 10
I am trying to run this query on http://dbpedia.org/sparql but I get an error that my query is too expensive. When I run the query trough http://dbpedia.org/snorql/ I get:
The estimated execution time 25012730 (sec) exceeds the limit of 1500 (sec) ...
When running the query through my python script using SPARQLWrapper I simply get an HTTP 500.
I figure I need to do something to optimize my SPARQL query. I need the data for iterating over educational institutions and importing it in to a local database, maybe I am using SPARQL wrong and should do this in a fundamentally different way.
Hope someone can help me!
The query
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?uri
?name
?homepage
?student_count
?native_name
?city
?country
?type
?lat ?long
?image
WHERE {
?uri rdf:type dbpedia-owl:EducationalInstitution .
?uri foaf:name ?name .
OPTIONAL { ?uri foaf:homepage ?homepage } .
OPTIONAL { ?uri dbpedia-owl:numberOfStudents ?student_count } .
OPTIONAL { ?uri dbpprop:nativeName ?native_name } .
OPTIONAL { ?uri dbpprop:city ?city } .
OPTIONAL { ?uri dbpprop:country ?country } .
OPTIONAL { ?uri dbpprop:type ?type } .
OPTIONAL { ?uri geo:lat ?lat . ?uri geo:long ?long } .
OPTIONAL { ?uri foaf:depiction ?image } .
}
ORDER BY ?uri
LIMIT 20 OFFSET 10
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
忘了它。您无法仅使用一个 SPARQL 从 dbpedia 获取该查询。这些选项非常昂贵。
要解决这个问题,您需要首先运行类似的操作:
然后迭代此查询的结果集,为每个 dbpedia-owl:EducationalInstitution 形成单个查询,例如 ...(注意末尾的过滤器查询):
其中
已从第一个查询中获得。...是的,它会很慢,并且您可能无法为在线应用程序运行它。建议:尝试制定某种缓存机制来位于您的应用程序和 dbpedia SPARQL 端点之间。
Forget it. You won't be able to get that query back from dbpedia with just one SPARQL. Those optionals are very expensive.
To work it around you need to first run something like:
Then iterate over the resultset of this query to form single queries for each
dbpedia-owl:EducationalInstitution
such as ... (notice the filter at the end of the query):Where
<http://dbpedia.org/resource/%C3%89cole_%C3%A9l%C3%A9mentaire_Marie-Curie>
has been obtained from the first query.... and yes it will be slow and you might not be able to run this for an online application. Advice: try to work out some sort of caching mechanism to sit between your app and the dbpedia SPARQL endpoint.
不要尝试立即获取整个数据集!添加
LIMIT
和OFFSET
子句,并使用它们来分页数据。添加
LIMIT 50
后,我几乎立即得到查询结果,我设法将限制提高到远高于该值,并且仍然得到响应,因此可以使用它。一旦找到适合您的页面大小,只需使用OFFSET
重复查询,直到不再获得结果,例如Don't try and get the entire dataset at once! Add a
LIMIT
and aOFFSET
clause and use those to page through the data.With
LIMIT 50
added I get back a result for your query almost instantly, I managed to get the limit up much higher than that and still get a response so play with it. Once you've found a page size that works for you just repeat the query with anOFFSET
as well until you get no more results e.g.如果您知道确切的 URI(例如,从之前的查询中),那么将 URI 直接放入 where 子句中比将 URI 放入 FILTER 中要快(至少根据我的经验)。
例如,prefer:
over
我还发现 UNION 实际上比设计用于匹配多个资源的过滤器执行得更快。
仅仅因为我们现在正在做SPARQL并不意味着我们可以忘记SQL调优的噩梦,欢迎来到SPARQL调优的奇妙世界! :)
If you know the exact URI (e.g. from a previous query), then putting the URI directly in the where clause is faster (at least in my experience) than putting the URI in a FILTER.
e.g., prefer:
over
Also I've found UNION's actually perform faster than filters designed to match multiple resources.
Just because we're doing SPARQL now doesn't mean we can forget the nightmares of SQL tuning, welcome to the wonderful world of SPARQL tuning! :)