尽管有过滤器,DBpedia 查询仍多次返回某些音乐剧

发布于 2024-10-19 11:23:26 字数 1062 浏览 7 评论 0原文

我正在尝试对 DBpedia 使用 SPARQL 查询来检索音乐剧和一些相关属性的列表。然而,尽管使用了适当的过滤器(据我所知),结果却多次包含许多音乐剧。这是我的查询:

    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX dbpprop: <http://dbpedia.org/property/>
    SELECT ?label ?abstract ?book ?music ?lyrics
    WHERE { 
        ?play <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Broadway_musicals> ;
            rdfs:label ?label ;
            dbo:abstract ?abstract ;
            dbpprop:book ?book ;
            dbpprop:lyrics ?lyrics ;
            dbpprop:music ?music .
        FILTER (LANG(?label) = 'en')    
        FILTER (LANG(?abstract) = 'en')
        FILTER (LANG(?book) = 'en')
        FILTER (LANG(?lyrics) = 'en')
        FILTER (LANG(?music) = 'en')
    }

结果列表有许多重复的条目。将查询粘贴到此处: DBpedia SPARQL Explorer,您会看到以“Mama Mia!”开头列表中有很多重复项。

知道我缺少什么来获得没有重复的独特结果吗?谢谢!

[由格伦·麦克唐纳编辑,澄清这里“重复”的是音乐剧,而不是三重奏。]

I'm trying to use a SPARQL query against DBpedia to retrieve a list of musicals and some associated properties. However, despite using the appropriate filters (as far as I can tell), the results include many of the musicals more than once. Here is my query:

    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX dbpprop: <http://dbpedia.org/property/>
    SELECT ?label ?abstract ?book ?music ?lyrics
    WHERE { 
        ?play <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Broadway_musicals> ;
            rdfs:label ?label ;
            dbo:abstract ?abstract ;
            dbpprop:book ?book ;
            dbpprop:lyrics ?lyrics ;
            dbpprop:music ?music .
        FILTER (LANG(?label) = 'en')    
        FILTER (LANG(?abstract) = 'en')
        FILTER (LANG(?book) = 'en')
        FILTER (LANG(?lyrics) = 'en')
        FILTER (LANG(?music) = 'en')
    }

The resulting list has many duplicate entries. Pasting the query here:
DBpedia SPARQL Explorer, you'll see that starting with 'Mama Mia!' there are a lot of duplicates in the list.

Any idea what I'm missing to get unique results with no duplicates? Thanks!

[Edited by glenn mcdonald to clarify that it's musicals which are "duplicated" here, not triples.]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

满身野味 2024-10-26 11:23:26

SPARQL 返回变量绑定。您的“重复项”是您的预计属性中的倍数的笛卡尔积。 《妈妈咪呀》有多名音乐作者和多名作词家,因此您可以得到他们的所有可能的组合,从而在您的表中产生一行。

多么痛苦啊? “解决方案”是使用 CONSTRUCT 而不是 SELECT,并处理返回图形而不是表。也许像这样:

http://dbpedia.org/snorql/?query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000% 2F01%2Frdf-schema%23%3E%0D%0A++++PREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0D%0A++++PREFIX+dbpprop% 3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2F%3E%0D%0A++++CONSTRUCT+%7B%0D%0A++++++++%3Fplay+rdfs%3Alabel+%3Flabel+%3B%0D%0A++++ ++++++++dbo%3Aabstract+%3Fabstract+%3B%0D%0A++++++++++++dbpprop%3Abook+%3Fbook+%3B%0D%0A++++++++ ++++dbpprop%3Alyrics+%3Flyrics+%3B%0D%0A++++++++++++dbpprop%3Amusic+%3Fmusic+.%0D%0A++++%7D%0D%0A++++WHERE+%7B+ %0D%0A++++++++%3Fplay+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2Fsubject%3E+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3ABroadway_musicals%3E+%3B%0D%0A++ ++++++++++rdfs%3Alabel+%3Flabel+%3B%0D%0A++++++++++++dbo%3Aabstract+%3Fabstract+%3B%0D%0A++++++ ++++++dbpprop%3Abook+%3Fbook+%3B%0D%0A++++++++++++dbpprop%3Alyrics+%3Flyrics+%3B%0D%0A++++++++++ ++dbpprop%3Amusic+%3Fmusic+.%0D%0A++++++++过滤器+%28LANG%28%3Flabel%29+%3D+%27en%27%29++++%0D%0A++++ ++++过滤器+%28LANG%28%3Fabstract%29+%3D+%27en%27%29%0D%0A++++++++过滤器+%28LANG%28%3Fbook%29+%3D+%27en%27 %29%0D%0A++++++++过滤器+%28LANG%28%3Flyrics%29+%3D+%27en%27%29%0D%0A++++++++过滤器+%28LANG%28% 3Fmusic%29+%3D+%27en%27%29%0D%0A++++%7D

SPARQL returns variable-bindings. Your "duplicates" are cartesian products of multiples in your projected properties. Mamma Mia has multiple music writers and multiple lyricists, so you get every possible combination of them that could produce a row in your table.

What a pain, huh? The "solution" is to use CONSTRUCT instead of SELECT, and deal with getting back a graph instead of a table. Maybe like this:

http://dbpedia.org/snorql/?query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0A++++PREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0D%0A++++PREFIX+dbpprop%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2F%3E%0D%0A++++CONSTRUCT+%7B%0D%0A++++++++%3Fplay+rdfs%3Alabel+%3Flabel+%3B%0D%0A++++++++++++dbo%3Aabstract+%3Fabstract+%3B%0D%0A++++++++++++dbpprop%3Abook+%3Fbook+%3B%0D%0A++++++++++++dbpprop%3Alyrics+%3Flyrics+%3B%0D%0A++++++++++++dbpprop%3Amusic+%3Fmusic+.%0D%0A++++%7D%0D%0A++++WHERE+%7B+%0D%0A++++++++%3Fplay+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2Fsubject%3E+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3ABroadway_musicals%3E+%3B%0D%0A++++++++++++rdfs%3Alabel+%3Flabel+%3B%0D%0A++++++++++++dbo%3Aabstract+%3Fabstract+%3B%0D%0A++++++++++++dbpprop%3Abook+%3Fbook+%3B%0D%0A++++++++++++dbpprop%3Alyrics+%3Flyrics+%3B%0D%0A++++++++++++dbpprop%3Amusic+%3Fmusic+.%0D%0A++++++++FILTER+%28LANG%28%3Flabel%29+%3D+%27en%27%29++++%0D%0A++++++++FILTER+%28LANG%28%3Fabstract%29+%3D+%27en%27%29%0D%0A++++++++FILTER+%28LANG%28%3Fbook%29+%3D+%27en%27%29%0D%0A++++++++FILTER+%28LANG%28%3Flyrics%29+%3D+%27en%27%29%0D%0A++++++++FILTER+%28LANG%28%3Fmusic%29+%3D+%27en%27%29%0D%0A++++%7D

煮茶煮酒煮时光 2024-10-26 11:23:26

重复项是否完全相同?即每个重复结果的每个变量的每个值都相同。

如果是这样,则在 SELECT 之后添加 DISTINCT 关键字,以强制 SPARQL 引擎丢弃重复的解决方案。

如果不是,那么格伦是完全正确的,因为为各种属性给出了多个值,因此您将得到多个结果。您可以使用子查询、GROUP BY 等实现复杂的解决方法,但它们往往会导致查询效率较低。有时您只需要在客户端处理重复项。

Are the duplicates exact duplicates? i.e. every value for every variable of each duplicate result is identical

If so then add the DISTINCT keyword after SELECT to force the SPARQL engine to discard duplicates solutions.

If not then Glenn is entirely correct that because there are multiple values given for the various properties so you will get multiple results. There are complex workarounds you can do with subqueries, GROUP BY etc. but they would tend to lead to less efficient queries. Sometimes you just have to deal with the duplicates on the client side.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文