获取所有维基百科信息框模板和使用它们的所有页面

发布于 2024-12-13 13:24:27 字数 1831 浏览 0 评论 0原文

给定像 Wikipedia: Stack Overflow 这样的 Wikipedia 页面,通常会有信息框(大部分位于右侧)页面顶部)。屏幕截图示例:

Stackoverflow Infobox at Wikipedia

  1. DBPedia 将所有这些属性列为 RDF 三元组。您可以在 DBPedia:Stack Overflow 中查看该示例。在那里您可以看到属性 dbpprop:wikiPageUsesTemplate 及其值 dbpedia:Template:Infobox_website ,这很有趣。我想知道哪些维基百科页面使用此模板。我怎样才能做到这一点并列出使用 Infobox_website 模板的所有页面?最好使用 SPARQL 查询,但我愿意接受其他简单的解决方案。

  2. 接下来是所有信息框模板的列表。 维基百科:类别信息框模板 显示所需维基百科类别的层次结构 - 看起来像我的内容我正在寻找。但我希望所有这些都以机器可读的格式在一页上。也许 DBPedia 在这里也是正确的?在 DBPedia:类别 Infox 模板DBPedia: INFOBOX 我发现信息很少。但这些看起来非常有前途。我如何使用 SPARQL 查找所有 Infobox 类型,以便我可以为每个类型重复执行步骤 1?

您可以使用它来测试 SPARQL 查询: http://dbpedia.org/snorql/

更新 1

我似乎已解决问题 1:SPARQL:列出包含 Infobox_website 的所有页面

更新 2

另外,这似乎是问题 2 的查询: SPARQL:列出所有信息框

Given a Wikipedia page like Wikipedia: Stack Overflow there are often Infoboxes (mostly on the right hand at the top of the page). Example screenshot:

Stackoverflow Infobox at Wikipedia

  1. DBPedia lists all these attributes as RDF triples. You can see the example at DBPedia: Stack Overflow. There you see the property dbpprop:wikiPageUsesTemplate with the value dbpedia:Template:Infobox_website which is interesting. I want to know which Wikipedia pages use this template. How can i do that and list all pages which use the Infobox_website template? Preferably with a SPARQL query but i am open to other easy solutions.

  2. Next thing is a list of all Infobox Templates. Wikipedia: Category Infobox Templates shows the hierarchy of the desired Wikipedia categories - that looks like what i am seeking. But i want all of these in a machine readable format, on one page. Maybe DBPedia is the right thing here too? At DBPedia: Category Infox Templates and DBPedia: INFOBOX i find very few information. But these are looking very promising. How can i use SPARQL to find all Infobox Types so that i can do step 1 repeatedly for each of them?

You can use this for testing the SPARQL queries: http://dbpedia.org/snorql/

Update 1

I seem to have solved problem number 1: SPARQL: list all pages with Infobox_website

Update 2

Also, this seems to be the query for problem number 2: SPARQL: list all Infoboxes

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

各自安好 2024-12-20 13:24:27

好吧,因为我似乎找到了一个解决方案(很可能不是最好的),我想分享它们。

1) 此 SPARQL 查询可用于查找包含特定 Infobox 类型的所有页面:

SELECT * WHERE { ?page dbpedia2:wikiPageUsesTemplate
。 ?页
dbpedia2:名称?名称。 }

<一href="http://dbpedia.org/snorql/?query=SELECT%20%2a%20WHERE%20%7B%20%20?page%20dbpedia2%3awikiPageUsesTemplate%20 %3Chttp://dbpedia.org/resource/Template%3aInfobox_website%3E%20.%20%20?page%20dbpedia2%3aname%20?name%20.%7D">链接at SNORQL


2) 此 SPARQL 查询可用于查找所有 Infobox 类型:

SELECT DISTINCT ?template WHERE { ?page
dbpedia2:wikiPageUsesTemplate ?模板 。过滤器(正则表达式(?模板,
“信息框”))。 } 排序依据?模板

链接在 SNORQL

Ok, since i seem to have found a solution (most probably not the best) i want to share them.

1) This SPARQL query can be used to find all pages that include a specific Infobox type:

SELECT * WHERE { ?page dbpedia2:wikiPageUsesTemplate
<http://dbpedia.org/resource/Template:Infobox_website> . ?page
dbpedia2:name ?name . }

Link at SNORQL


2) This SPARQL query can be used to find all Infobox types:

SELECT DISTINCT ?template WHERE { ?page
dbpedia2:wikiPageUsesTemplate ?template . FILTER (regex(?template,
"Infobox")) . } ORDER BY ?template

Link at SNORQL

是伱的 2024-12-20 13:24:27

之前的答案似乎已经失效了。只需要进行一些小的更改即可让它们在新的 dbpedia 查询端点上工作,网址为 http://live.dbpedia.org/ sparql 不过。

要获取所有页面及其使用的模板的列表,可以使用此查询:

SELECT * WHERE {  ?page  dbpprop:wikiPageUsesTemplate ?template . }

查看结果(仅限100)

如果您正在寻找特定模板:

SELECT * WHERE {  
   ?page  
   dbpprop:wikiPageUsesTemplate 
   <http://dbpedia.org/resource/Template:Infobox_website> . 
}

查看结果

对于我的用例,我对 Wikipedia URL 而不是 DBPedia 页面感兴趣,因此我使用以下查询:

SELECT ?wikipedia_url WHERE {  
   ?page  
   dbpprop:wikiPageUsesTemplate 
   <http://dbpedia.org/resource/Template:Infobox_website> . 
   ?page foaf:isPrimaryTopicOf ?wikipedia_url .
}

查看结果

我还使用 curl 将结果提取到脚本中:

$ curl -s "http://live.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwikipedia_url+WHERE+%7B+%0D%0A%09+%3Fpage+%0D%0A%09+dbpprop%3AwikiPageUsesTemplate+%0D%0A%09+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FTemplate%3AInfobox_website%3E+.+%0D%0A+%3Fpage+foaf%3AisPrimaryTopicOf+%3Fwikipedia_url+.%0D%0A%0D%0A%09%7D&format=text%2Ftab-separated-values" \
| tr -d \" | grep -v "^wikipedia_url$" | head
http://en.wikipedia.org/wiki/U.S._News_&_World_Report
http://en.wikipedia.org/wiki/FriendFinder
http://en.wikipedia.org/wiki/Debkafile
http://en.wikipedia.org/wiki/GTPlanet
http://en.wikipedia.org/wiki/Lithuanian_Wikipedia
http://en.wikipedia.org/wiki/Connexions
http://en.wikipedia.org/wiki/Hypno5ive
http://en.wikipedia.org/wiki/Scoop_(website)
http://en.wikipedia.org/wiki/Bhoomi_(software)
http://en.wikipedia.org/wiki/Brainwashed_(website)

我不确定这是否给出了完整的结果集,因为它返回1698 个结果,而 wmflabs.org 似乎建议应该有 4439。


对于问题的第二部分,只需对之前的查询进行一点小小的更改即可获取所有模板的列表:

SELECT DISTINCT ?template WHERE { 
    ?page  
    dbpprop:wikiPageUsesTemplate  
    ?template . 
    FILTER (regex(?template, "Infobox")) . 
} ORDER BY ?template

查看结果

The previous answers seem to have stopped working. Only a small change is required to get them working at the new dbpedia query endpoint at http://live.dbpedia.org/sparql though.

To get a list of all of the pages and the templates that they use this query works:

SELECT * WHERE {  ?page  dbpprop:wikiPageUsesTemplate ?template . }

See results (limited to 100)

If you're looking for a specific template:

SELECT * WHERE {  
   ?page  
   dbpprop:wikiPageUsesTemplate 
   <http://dbpedia.org/resource/Template:Infobox_website> . 
}

See results

And for my use case I'm interested in the Wikipedia URL rather than the DBPedia page, so I'm using this query:

SELECT ?wikipedia_url WHERE {  
   ?page  
   dbpprop:wikiPageUsesTemplate 
   <http://dbpedia.org/resource/Template:Infobox_website> . 
   ?page foaf:isPrimaryTopicOf ?wikipedia_url .
}

See results

I'm also using curl to pull the results into a script:

$ curl -s "http://live.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwikipedia_url+WHERE+%7B+%0D%0A%09+%3Fpage+%0D%0A%09+dbpprop%3AwikiPageUsesTemplate+%0D%0A%09+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FTemplate%3AInfobox_website%3E+.+%0D%0A+%3Fpage+foaf%3AisPrimaryTopicOf+%3Fwikipedia_url+.%0D%0A%0D%0A%09%7D&format=text%2Ftab-separated-values" \
| tr -d \" | grep -v "^wikipedia_url$" | head
http://en.wikipedia.org/wiki/U.S._News_&_World_Report
http://en.wikipedia.org/wiki/FriendFinder
http://en.wikipedia.org/wiki/Debkafile
http://en.wikipedia.org/wiki/GTPlanet
http://en.wikipedia.org/wiki/Lithuanian_Wikipedia
http://en.wikipedia.org/wiki/Connexions
http://en.wikipedia.org/wiki/Hypno5ive
http://en.wikipedia.org/wiki/Scoop_(website)
http://en.wikipedia.org/wiki/Bhoomi_(software)
http://en.wikipedia.org/wiki/Brainwashed_(website)

I'm not sure if this gives the full result set though, because it returns 1698 results whereas wmflabs.org seems to suggest there should be 4439.


For the second part of your question, only a small change is needed from the previous query to get a list of all templates:

SELECT DISTINCT ?template WHERE { 
    ?page  
    dbpprop:wikiPageUsesTemplate  
    ?template . 
    FILTER (regex(?template, "Infobox")) . 
} ORDER BY ?template

See results

皓月长歌 2024-12-20 13:24:27

您还可以使用 MediaWiki API 的 embeddedin 查询返回包含以下内容的所有页面的列表:给定的模板。不过,您会想要使用库来访问 API,您更喜欢哪种语言?对于 Ruby,我建议使用 MediaWiki::Gateway

You can also use the MediaWiki API's embeddedin query to return a list of all pages that include a given template. You'll want to use a library for accessing the API though, which language would you prefer? For Ruby, I'd suggest MediaWiki::Gateway.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文